computation in a single neuron: hodgkin and huxley...

35
ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin and Huxley Revisited Blaise Ag ¨ uera y Arcas [email protected] Rare Books Library, Princeton University, Princeton, NJ 08544, U.S.A. Adrienne L. Fairhall [email protected] NEC Research Institute, Princeton, NJ 08540, and Department of Molecular Biology, Princeton, NJ 08544, U.S.A. William Bialek [email protected] NEC Research Institute, Princeton, NJ 08540, and Department of Physics, Princeton, NJ 08544, U.S.A. A spiking neuron “computes” by transforming a complex dynamical in- put into a train of action potentials, or spikes. The computation per- formed by the neuron can be formulated as dimensional reduction, or feature detection, followed by a nonlinear decision function over the low- dimensional space. Generalizations of the reverse correlation technique with white noise input provide a numerical strategy for extracting the relevant low-dimensional features from experimental data, and informa- tion theory can be used to evaluate the quality of the low–dimensional approximation. We apply these methods to analyze the simplest biophysi- cally realistic model neuron, the Hodgkin–Huxley (HH) model, using this system to illustrate the general methodological issues. We focus on the features in the stimulus that trigger a spike, explicitly eliminating the effects of interactions between spikes. One can approximate this trigger- ing “feature space” as a two-dimensional linear subspace in the high- dimensional space of input histories, capturing in this way a substantial fraction of the mutual information between inputs and spike time. We nd that an even better approximation, however, is to describe the rele- vant subspace as two dimensional but curved; in this way, we can capture 90% of the mutual information even at high time resolution. Our analysis provides a new understanding of the computational properties of the HH model. While it is common to approximate neural behavior as “integrate and re,” the HH model is not an integrator nor is it well described by a single threshold. Neural Computation 15, 1715–1749 (2003) c ° 2003 Massachusetts Institute of Technology

Upload: hoangque

Post on 17-Feb-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

ARTICLE Communicated by Paul Bressloff

Computation in a Single Neuron Hodgkin and HuxleyRevisited

Blaise Aguera y ArcasblaiseaprincetoneduRare Books Library Princeton University Princeton NJ 08544 USA

Adrienne L FairhallfairhallprincetoneduNEC Research Institute Princeton NJ 08540 and Department of Molecular BiologyPrinceton NJ 08544 USA

William BialekwbialekprincetoneduNEC Research Institute Princeton NJ 08540 and Department of PhysicsPrinceton NJ 08544 USA

A spiking neuron ldquocomputesrdquo by transforming a complex dynamical in-put into a train of action potentials or spikes The computation per-formed by the neuron can be formulated as dimensional reduction orfeature detection followed by a nonlinear decision function over the low-dimensional space Generalizations of the reverse correlation techniquewith white noise input provide a numerical strategy for extracting therelevant low-dimensional features from experimental data and informa-tion theory can be used to evaluate the quality of the lowndashdimensionalapproximation We apply these methods to analyze the simplest biophysi-cally realistic model neuron the HodgkinndashHuxley (HH)model using thissystem to illustrate the general methodological issues We focus on thefeatures in the stimulus that trigger a spike explicitly eliminating theeffects of interactions between spikes One can approximate this trigger-ing ldquofeature spacerdquo as a two-dimensional linear subspace in the high-dimensional space of input histories capturing in this way a substantialfraction of the mutual information between inputs and spike time Wend that an even better approximation however is to describe the rele-vant subspace as two dimensional but curved in this way we can capture90 of the mutual information even at high time resolution Our analysisprovides a new understanding of the computational properties of the HHmodel While it is common to approximate neural behavior as ldquointegrateand rerdquo the HH model is not an integrator nor is it well described by asingle threshold

Neural Computation 15 1715ndash1749 (2003) cdeg 2003 Massachusetts Institute of Technology

1716 B Aguera y Arcas A Fairhall and W Bialek

1 Introduction

On short timescales one can conceive of a single neuron as a computationaldevice that maps inputs at its synapses into a sequence of action poten-tials or spikes To a good approximation the dynamics of this mapping aredetermined by the kinetic properties of ion channels in the neuronrsquos mem-brane In the 50 years since the pioneering work of Hodgkin and Huxleywe have seen the evolution of an ever more detailed description of chan-nel kinetics making it plausible that the short time dynamics of almostany neuron we encounter will be understandable in terms of interactionsamong a mixture of diverse but known channel types (Hille 1992 Koch1999) The existence of so nearly complete a microscopic picture of single-neuron dynamics brings into focus a very different question What does theneuron compute Although models in the HodgkinndashHuxley (HH) traditiondene a dynamical system that will reproduce the behavior of the neuronthis description in terms of differential equations is far from our intuitionaboutmdashor the formal description ofmdashcomputation

The problem of what neurons compute is one instance of a more generalproblem in modern quantitative biology and biophysics Given a progres-sively more complete microscopic description of proteins and their inter-actions how do we understand the emergence of function In the case ofneurons the proteins are the ion channels and the interactions are verysimple current ows through open channels charging the cellrsquos capaci-tance and all channels experience the resulting voltage Arguably there isno other network of interacting proteins for which the relevant equationsare known in such detail indeed some efforts to understand function andcomputation in other networks of proteins make use of analogies to neuralsystems (Bray 1995) Despite the relative completeness of our microscopicpicture for neurons there remains a huge gap between the description ofmolecular kinetics and the understanding of function Given some complexdynamic input to a neuron we might be able to simulate the spike trainthat will result but we are hard pressed to look at the equations for channelkinetics and say that this transformation from inputs to spikes is equivalentto some simple (or perhaps not so simple) computation such as lteringthresholding coincidence detection or feature extraction

Perhaps the problem of understanding computational function in amodel of ion channel dynamics is a symptom of a much deeper mathemati-cal difculty Despite the fact that all computers are dynamical systems thenatural mathematical objects in dynamical systems theory are very differentfrom those in the theory of computation and it is not clear how to connectthese different formal schemes Finding a general mapping from dynamicalsystems to their equivalent computational functions is a grand challengebut we will take a more modest approach

We believe that a key intuition for understanding neural computation isthe concept of feature selectivity while the space of inputs to a neuronmdash

Computation in a Single Neuron 1717

whether we think of inputs as arriving at the synapses or being driven bysensory signals outside the brainmdashis vast individual neurons are sensitiveonly to some restricted set of features in this vast space The most gen-eral way to formalize this intuition is to say that we can compress (in theinformation-theoretic sense) our description of the inputs without losingany information about the neural output (Tishby Pereira amp Bialek 1999)We might hope that this selective compression of the input data has a simplegeometric description so that the relevant bits about the input correspondto coordinates along some restricted set of relevant dimensions in the spaceof inputs If this is the case feature selectivity should be formalized as areduction of dimensionality (de Ruyter van Steveninck amp Bialek 1988) andthis is the approach we follow here Closely related work on the use ofdimensionality reduction to analyze neural feature selectivity has been de-scribed in recent work (Bialek amp de Ruyter van Steveninck 2003 SharpeeRust amp Bialek in press)

Here we develop the idea of dimensionality reduction as a tool for anal-ysis of neural computation and apply these tools to the HH model Whileour initial goal was to test new analysis methods in the context of a presum-ably simple and well-understood model we have found that the HH neu-ron performs a computation of surprising richness Preliminary accounts ofthese results have already appeared (Aguera y Arcas 1998 Aguera y ArcasBialek amp Fairhall 2001)

2 Dimensionality Reduction

Neurons take input signals at their synapses and give as output sequencesof spikes To characterize a neuron completely is to identify the mappingbetween neuronal input and the spike train the neuron produces inresponseIn the absence of any simplifying assumptions this requires probing thesystem with every possible input Most often these inputs are spikes fromother neurons each neuron typically has of order N raquo 103 presynapticconnections If the system operates at 1 msec resolutionand the timewindowof relevant inputs is 40 msec then we can think of a single neuron as havingan input described by a raquo 4 pound 104 bit wordmdashthe presence or absence of aspike in each 1 msec bin for each presynaptic cellmdashwhich is then mapped toa one (spike) or zero (no spike) More realistically if average spike rates areraquo 10 siexcl1 the input words can be compressed by a factor of 10 In this picturea neuron computes a Boolean function over roughly 4000 variables Clearlyone cannot sample every one of the raquo 24000 inputs to identify the neuralcomputation Progress requires making some simplifyingassumption aboutthe function computed by the neuron so that we can vastly reduce the spaceof possibilities over which to search We use the idea of dimensionalityreduction in this spirit as a simplifying assumption that allows us to makeprogress but that also must be tested directly

1718 B Aguera y Arcas A Fairhall and W Bialek

The ideas of feature selectivity and dimensionality reduction have a longhistory in neurobiology The idea of receptive elds as formulated by Hart-line Kufer and Barlow for the visual system gave a picture of neuronsas having a template against which images would be correlated (Hartline1940 Kufer 1953 Barlow 1953) If we think of images as vectors in ahigh-dimensional space with coordinates determined by the intensities ofeach pixel then the simplest receptive eld models describe the neuronas sensitive to only one direction or projection in this high-dimensionalspace This picture of projection followed by thresholding or some othernonlinearity to determine the probability of spike generation was formal-ized in the linear perceptron (Rosenblatt 1958 1962) In subsequent workBarlow Hill and Levick (1964) characterized neurons in which the recep-tive eld has subregions in space and time such that summation is at leastapproximately linear in each subregion but these summed signals inter-act nonlinearly for example to generate direction selectivity and motionsensitivity We can think of Hubel and Wieselrsquos description of complexand hypercomplex cells (Hubel amp Wiesel 1962) again as a picture of ap-proximately linear summation within subregions followed by nonlinearoperations on these multiple summed signals More formally the propercombination of linear summation and nonlinear or logical operations mayprovide a useful bridge from receptive eld properties to proper geometricprimitives in visual computation (Iverson amp Zucker 1995) In the same waythat a single receptive eld or perceptron model has one relevant dimen-sion in the space of visual stimuli these more complex cells have as manyrelevant dimensions as there are independent subregions of the receptiveeld Although this number is larger than one it still is much smaller thanthe full dimensionality of the possible spatiotemporal variations in visualinputs

The idea that neurons in the auditory system might be described by alter followed by a nonlinear transformation to determine the probabilityof spike generation was the inspiration for de Boerrsquos development (de Boeramp Kuyper 1968) of triggered or reverse correlation Modern uses of reversecorrelation to characterize the ltering or receptive eld properties of a neu-ron often emphasize that this approach provides a ldquolinear approximationrdquoto the input-output properties of the cell but the original idea was almostthe opposite neurons clearly are nonlinear devices but this is separate fromthe question of whether the probability of generating a spike is determinedby a simple projection of the sensory input onto a single lter or template Infact as explained by Rieke Warland Bialek and de Ruyter van Steveninck(1997) linearity is seldom a good approximation for the neural input-outputrelation but if there is one relevant dimension then (provided that inputsignals are chosen with suitable statistics) the reverse correlation method isguaranteed to nd this one special direction in the space of inputs to whichthe neuron is sensitive While the reverse correlation method is guaranteedto nd the one relevant dimension if it exists the method does not include

Computation in a Single Neuron 1719

any way of testing for other relevant dimensions or more generally formeasuring the dimensionality of the relevant subspace

The idea of characterizing neural responses directly as the reduction ofdimensionality emerged from studies (de Ruyter van Steveninck amp Bialek1988) of a motion-sensitive neuron in the y visual system In particularthis work suggested that it is possible to estimate the dimensionality of therelevant subspace rather than just assuming that it is small (or equal to one)More recent work on the y visual system has exploited the idea of dimen-sionality reduction to probe both the structure and adaptation of the neuralcode (Brenner Bialek amp de Ruyter van Steveninck 2000 Fairhall LewenBialek amp de Ruyter van Steveninck 2001) and the nature of the computationthat extracts the motion signal from the spatiotemporal array of photore-ceptor inputs (Bialek amp de Ruyter van Steveninck 2003) Here we reviewthe ideas of dimensionality reduction from previous work extensions ofthese ideas begin in section 3

In the spirit of neural network models we will simplify away the spatialstructure of neurons and consider time-dependent currents It injected intoa pointndashlike neuron While this misses much of the complexity of real cellswe will nd that even this system is highly nontrivial If the input is aninjected current then the neuron maps the history of this current It lt t0into the presence or absence of a spike at time t0 More generally we mightimagine that the cell (or our description) isnoisy so that there is a probabilityof spiking P[spike at t0 j It lt t0] that depends on the current history Thedependence on the history of the current means that the input signal stillis high dimensional even without spatial dependence Working at timeresolution 1t and assuming that currents in a window of size T are relevantto the decision to spike the input space is of dimension D D T=1t whereD is often of order 100

The idea of dimensionality reduction is that the probability of spike gen-eration is sensitive only to some limited number of dimensions K withinthe D-dimensional space of inputs We begin our analysis by searching forlinear subspaces that is a set of signals s1 s2 sK that can be constructedby ltering the current

ssup1 DZ 1

0dt fsup1tIt0 iexcl t (21)

so that the probability of spiking depends on only this small set of signals

P[spike at t0 j It lt t0] D P[spike at t0]gs1 s2 sK (22)

where the inclusion of the average probability of spiking P[spike at t0]leaves g dimensionless If we think of the current It0 iexcl T lt t lt t0 asa D-dimensional vector with one dimension for each discrete sample atspacing 1t then the ltered signals si are linear projections of this vector In

1720 B Aguera y Arcas A Fairhall and W Bialek

this formulation characterizing the computation done by a neuron involvesthree steps

1 Estimate the number of relevant stimulus dimensions K with the hopethat there will be many fewer than the original dimensionality D

2 Identify a set of lters that project into this relevant subspace

3 Characterize the nonlinear function gEs

The classical perceptronndashlike cell of neural network theory would have onlyone relevant dimension given by the vector of weights and a simple formfor g typically a sigmoid

Rather than trying to lookdirectly at the distribution of spikes given stim-uli we follow de Ruyter van Steveninck and Bialek (1988) and consider thedistribution of signals conditional on the response P[It lt t0 j spike at t0]also called the response conditional ensemble (RCE) these are related byBayesrsquo rule

P[spike at t0 j It lt t0]P[spike at t0]

D P[It lt t0 j spike at t0]P[It lt t0]

(23)

We can now compute various moments of the RCE The rst moment is thespike-triggered average stimulus (STA)

STAiquest DZ

[dI]P[It lt t0 j spike at t0]It0 iexcl iquest (24)

which is the object that one computes in reverse correlation (de Boer ampKuyper 1968 Rieke et al 1997) If we choose the distribution of inputstimuli P[It lt t0] to be gaussian white noise then for a perceptronndashlikeneuron sensitive to only one direction in stimulus space it can be shownthat the STA or rst moment of the RCE is proportional to the vector or lterf iquest that denes this direction (Rieke et al 1997)

Although it is a theorem that the STA is proportional to the relevant l-ter f iquest in principle it is possible that the proportionality constant is zeromost plausibly if the neuronrsquos response has some symmetry such as phaseinvariance in the response of high-frequency auditory neurons It also isworth noting that what is really important in this analysis is the gaussiandistribution of the stimuli not the ldquowhitenessrdquo of the spectrum For non-white but gaussian inputs the STA measures the relevant lter blurred bythe correlation function of the inputs and hence the true lter can be recov-ered (at least in principle) by deconvolution For nongaussian signals andnonlinear neurons there is no corresponding guarantee that the selectivityof the neuron can be separated from correlations in the stimulus (Sharpeeet al in press)

To obtain more than one relevant direction (or to reveal relevant direc-tions when symmetries cause the STA to vanish) we proceed to second

Computation in a Single Neuron 1721

order and compute the covariance matrix of uctuations around the spike-triggered average

Cspikeiquest iquest 0 DZ

[dI]P[It lt t0 j spike at t0]It0 iexcl iquest It0 iexcl iquest 0

iexcl STAiquest STAiquest 0 (25)

In the same way that we compare the spike-triggered average to some con-stant average level of the signal in the whole experiment we compare thecovariance matrix Cspike with the covariance of the signal averaged over thewhole experiment

Cprioriquest iquest 0 DZ

[dI]P[It lt t0]It0 iexcl iquest It0 iexcl iquest 0 (26)

to construct the change in the covariance matrix

1C D Cspike iexcl Cprior (27)

With time resolution 1t in a window of duration T as above all of thesecovariances are DpoundD matrices In the same way that the spike-triggered av-erage has the clearest interpretation when we choose inputs from a gaussiandistribution 1C also has the clearest interpretation in this case Specicallyif inputs are drawn from a gaussian distribution then it can be shown that(Bialek amp de Ruyter van Steveninck 2003)

1 If the neuron is sensitive to a limited set of K-input dimensions as inequation 22 then 1C will have only K nonzero eigenvalues1 In thisway we can measure directly the dimensionality K of the relevantsubspace

2 If the distribution of inputs is both gaussian and white then the eigen-vectors associated with the nonzero eigenvalues span the same spaceas that spanned by the lters f fsup1iquest g

3 For nonwhite (correlated) but still gaussian inputs the eigenvectorsspan the space of the lters f fsup1iquest g blurred by convolution with thecorrelation function of the inputs

Thus the analysis of 1C for neurons responding to gaussian inputs shouldallow us to identify the subspace of inputs of relevance and test specicallythe hypothesis that this subspace is of low dimension

1 As with the STA it is in principle possible that symmetries or accidental featuresof the function gEs would cause some of the K eigenvalues to vanish but this is veryunlikely

1722 B Aguera y Arcas A Fairhall and W Bialek

Several points are worth noting First except in special cases the eigen-vectors of 1C and the lters f fsup1iquest g are not the principal components of theRCE and hence this analysis of 1C is not a principal component analysisSecond the nonzero eigenvalues of 1C can be either positive or negativedepending on whether the variance of inputs along that particular direc-tion is larger or smaller in the neighborhood of a spike Third although theeigenvectors span the relevant subspace these eigenvectors do not form apreferred coordinate system within this subspace Finally we emphasizethat dimensionality reductionmdashidentication of the relevant subspacemdashisonly the rst step in our analysis of the computation done by a neuron

3 Measuring the Success of Dimensionality Reduction

The claim that certain stimulus features are most relevant is in effect a modelfor the neuron so the next question is how to measure the effectiveness oraccuracy of this model Several different ideas have been suggested in theliterature as ways of testing models based on linear receptive elds in thevisual system (Stanley Lei amp Dan 1999 Keat Reinagel Reid amp Meister2001) or linear spectrotemporal receptive elds in the auditory system (The-unissen Sen amp Doupe 2000) These methods have in common that they in-troduce a metric to measure performancemdashfor example mean square errorin predicting the ring rate as averaged over some window of time Ideallywe would like to have a performance measure that avoids any arbitrari-ness in the choice of metric and such metric-free measures are provideduniquely by information theory (Shannon 1948 Cover amp Thomas 1991)

Observing the arrival time t0 of a single spikeprovidesa certainamount ofinformationabout the input signals Since information ismutual we can alsosay that knowing the input signal trajectory It lt t0 provides informationabout the arrival time of the spike If ldquodetails are irrelevantrdquo then we shouldbe able to discard these details from our description of the stimulus and yetpreserve the mutual information between the stimulus and spike arrivaltimes (for an abstract discussion of such selective compression see Tishbyet al 1999) In constructing our low-dimensional model we represent thecomplete (D-dimensional) stimulus It lt t0 by a smaller number (K lt D)of dimensions Es D s1 s2 sK

The mutual information I[It lt t0I t0] is a property of the neuron itselfwhile the mutual information I[EsI t0] characterizes how much our reduceddescription of the stimulus can tell us about when spikes will occur Nec-essarily our reduction of dimensionality causes a loss of information sothat

I[EsI t0] middot I[It lt t0I t0] (31)

but if our reduced description really captures the computation done by theneuron then the two information measures will be very close In particular

Computation in a Single Neuron 1723

if the neuron were described exactly by a lower-dimensional modelmdashas fora linear perceptron or for an integrate-and-re neuron (Aguera y Arcas ampFairhall 2003)mdashthen the two information measures would be equal Moregenerally the ratio I[EsI t0]=I[It lt t0I t0] quanties the efciency of thelow-dimensional model measuring the fraction of information about spikearrival times that our K dimensions capture from the full signal It lt t0

As shown by Brenner Strong Koberle Bialek and de Ruyter van Steven-inck (2000) the arrival time of a single spike provides an information

I[It lt t0I t0] acute Ione spike D 1T

Z T

0dt

rtNr

log2

micrort

Nr

para (32)

where rt is the time-dependent spike rate Nr is the average spike rateand hcent cent centi denotes an average over time In principle information should becalculated as an average over the distribution of stimuli but the ergodicityof the stimulus justies replacing this ensemble average with a time averageFor a deterministic system like the HH equations the spike rate is a singularfunction of time given the inputs It spikes occur at denite times with norandomness or irreproducibility If we observe these responses with a timeresolution 1t then for 1t sufciently small the rate rt at any time t eitheris zero or corresponds to a single spike occurring in one bin of size 1t thatis r D 1=1t Thus the information carried by a single spike is

Ione spike D iexcl log2 Nr1t (33)

On the other hand if the probability of spiking really depends on only thestimulus dimensions s1 s2 sK we can substitute

rtNr

PEs j spike at tPEs

(34)

Replacing the time averages in equation 32 with ensemble averages wend

I[EsI t0]acute IEsone spike D

ZdKsPEs j spike at t log2

microPEs j spike at t

PEs

para(35)

(for details of these arguments see Brenner Strong et al 2000) This al-lows us to compare the information captured by the K-dimensional reducedmodel with the true information carried by single spikes in the spike train

For reasons that we will discuss in the following section and as waspointed out in Aguera y Arcas et al (2001) and Aguera y Arcas and Fairhall(2003) we will be considering isolated spikesmdashthose separated from pre-vious spikes by a period of silence This has important consequences forour analysis Most signicantly as we will be considering spikes that occur

1724 B Aguera y Arcas A Fairhall and W Bialek

on a background of silence the relevant stimulus ensemble conditionedon the silence is no longer gaussian Further we will need to rene ourinformation estimate

The derivation of equation 32 makes clear that a similar formula mustdetermine the information carried by the occurrence time of any event notjust single spikes we can dene an event rate in place of the spike rate andthen calculate the information carried by these events (Brenner Strong etal 2000) In the case here we wish to compute the information obtainedby observing an isolated spike or equivalently by the event silence+spikeThis is straightforward we replace the spike rate by the rate of isolatedspikes and equation 32 will give us the information carried by the arrivaltime of a single isolated spike The problem is that this information includesboth the information carried by the occurrence of the spike and the infor-mation conveyed in the condition that there were no spikes in the precedingtsilence msec (for an early discussion of the information carried by silencesee de Ruyter van Steveninck amp Bialek 1988) We would like to separatethese contributions since our idea of dimensionality reduction applies onlyto the triggering of a spike not to the temporally extended condition ofnonspiking

To separate the information carried by the isolated spike itself we haveto ask how much information we gain by seeing an isolated spike given thatthe condition for isolation has already been met As discussed by BrennerStrong et al (2000) we can compute this information by thinking about thedistribution of times at which the isolated spike can occur Given that weknow the input stimulus the distribution of times at which a single isolatedspike will be observed is proportional to risot the time-dependent rate orperistimulus time histogram for isolated spikes With propernormalizationwe have

Pisot j inputs D1T

cent1

Nrisorisot (36)

where T is duration of the (long) window in which we can look for thespike and Nriso is the average rate of isolated spikes This distribution has anentropy

Sisot j inputs D iexclZ T

0dt Pisot j inputs log2 Pisot j inputs (37)

D iexcl1T

Z T

0dt

risotNriso

log2

micro1T

cent risotNriso

para(38)

D log2TNriso1t bits (39)

where again we use the fact that for a deterministic system the time-dependent rate must be either zero or the maximum allowed by our time

Computation in a Single Neuron 1725

resolution 1t To compute the information carried by a single spike weneed to compare this entropy with the total entropy possible when we donot know the inputs

It is tempting to think that without knowledge of the inputs an isolatedspike is equally likely to occur anywhere in the window of size T whichleads us back to equation 33 with Nr replaced by Nriso In this case howeverwe are assuming that the condition for isolation has already been met Thuseven without observing the inputs we know that isolated spikes can occuronly in windows of time whose total length is Tsilence D T cent Psilence wherePsilence is the probability that any moment in time is at least tsilence after themost recent spike Thus the total entropy of isolated spike arrival times(given that the condition for silence has been met) is reduced from log2 T to

Sisot j silence D log2T cent Psilence (310)

and the information that the spike carries beyond what we know from thesilence itself is

1Iiso spike D Sisot j silence iexcl Sisot j inputs (311)

D1T

Z T

0dt

risotNriso

log2

microrisot

Nrisocent Psilence

para(312)

D iexcl log2Nriso1t C log2 Psilence bits (313)

This information which is dened independent of any model for the fea-ture selectivity of the neuron provides the benchmark against which ourreduction of dimensionality will be measured To make the comparisonhowever we need the analog of equation 35

Equation 312 provides us with an expression for the information con-veyed by isolated spikes in terms of the probability that these spikes occurat particular times this is analogous to equation 32 for single (nonisolated)spikes If we follow a path analogous to that which leads from equation 32to equation 35 we nd an expression for the information that an isolatedspike provides about the K stimulus dimensions Es

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para

C hlog2 Psilence j Esi (314)

where the prior is now also conditioned on silence PEs j silence is thedistribution of Es given that Es is preceded by a silence of at least tsilence Noticethat this silence-conditioned distribution is not knowable a priori and inparticular it is not gaussian PEs j silence must be sampled from data

The last term in equation 314 is the entropy of a binary variable thatindicates whether particular moments in time are silent given knowledge

1726 B Aguera y Arcas A Fairhall and W Bialek

of the stimulus Again since the HH model is deterministic this conditionalentropy should be zero if we keep a complete description of the stimulusIn fact we are not interested in describing those features of the stimulusthat lead to silence and it is not fair (as we will see) to judge the success ofdimensionality reduction by looking at the prediction of silence which nec-essarily involves multiple dimensions To make a meaningful comparisonthen we will assume that there is a perfect description of the stimulus con-ditions leading to silence and focus on the stimulus features that trigger theisolated spike When we approximate these features by the K-dimensionalspace Es we capture an amount of information

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para (315)

This is the information that we can compare with 1Iiso spike in equation 313to determine the efciency of our dimensionality reduction

4 Characterizing the Hodgkin-Huxley Neuron

For completeness we begin with a brief review of the dynamics of thespace-clamped HHneuron (Hodgkin amp Huxley 1952) Hodgkin and Huxleymodeled the dynamics of the current through a patch of membrane withion-specic conductances

CdVdt

D It iexcl NgKn4V iexcl VK iexcl NgNam3hV iexcl VNa iexcl NglV iexcl Vl (41)

where It is injected current K and Na subscripts denote potassiumndash andsodiumndashrelated variables respectively and l (for ldquoleakagerdquo) terms includeall other ion conductances with slower dynamics C is the membrane ca-pacitance VK and VNa are ion-specic reversal potentials and Vl is denedsuch that the total voltage V is exactly zero when the membrane is at restNgK NgNa and Ngl are empirically determined maximal conductances for thedifferent ion species and the gating variables n m and h (on the interval[0 1]) have their own voltage-dependent dynamics

dn=dt D 001V C 011 iexcl n expiexcl01V iexcl 0125n expV=80

dm=dt D 01V C 251 iexcl m expiexcl01V iexcl 15 iexcl 4m expV=18

dh=dt D 0071 iexcl h exp005V iexcl h expiexcl01V iexcl 4 (42)

We have used the original values for these parameters except for changingthe signs of the voltages to correspond to the modern sign convention C D 1sup1Fcm2 NgK D 36 mScm2 NgNa D 120 mScm2 Ngl D 03 mScm2 VK D iexcl12mV VNa D C115 mV Vl D C10613 mV We have taken our system to be

Computation in a Single Neuron 1727

a frac14 pound 302 sup1m2 patch of membrane We solve these equations numericallyusing fourth-order RungendashKutta integration

The system is driven with a gaussian random noise current It gener-ated by smoothing a gaussian random number stream with an exponentiallter to generate a correlation time iquest It is convenient to choose iquest to be longerthan the time steps of numerical integration since this guarantees that allfunctions are smooth on the scale of single time steps Here we will alwaysuse iquest D 02 msec a value that is both less than the timescale over whichwe discretize the stimulus for analysis and far less than the neuronrsquos ca-pacitative smoothing timescale RC raquo 3 msec It has a standard deviationfrac34 but since the correlation time is short the relevant parameter usually isthe spectral density S D frac34 2iquest we also add a DC offset I0 In the followingwe will consider two parameter regimes I0 D 0 and I0 a nite value whichleads to more periodic ring

The integration step size is xed at 005 msec The key numerical exper-iments were repeated at a step size of 001 msec with identical results Thetime of a spike is dened as the moment of maximum voltage for voltagesexceeding a threshold (see Figure 1) estimated to subsample precision byquadratic interpolation As spikes are both very stereotyped and very largecompared to subspiking uctuations the precise value of this threshold isunimportant we have used C20 mV

41 Qualitative Description of Spiking The rst step in our analysis isto use reverse correlation equation 24 to determine the average stimulusfeature preceding a spike the STA In Figure 1(top) we display the STAin a regime where the spectral density of the input current is 65 pound 10iexcl4

nA2 msec The spike-triggered averages of the gating terms n4 (proportionof open potassium channels) and m3h (proportion of open sodium chan-nels) and the membrane voltage V are plotted in Figure 1 (middle and bot-tom) The error bars mark the standard deviation of the trajectories of thesevariables

As expected the voltage and gating variables follow highly stereotypedtrajectories during the raquo5 msec surrounding a spike First the rapid open-ing of the sodium channels causes a sharp membrane depolarization (orrise in V) the slower potassium channels then open and repolarize themembrane leaving it at a slightly lower potential than rest The potassiumchannels close gradually but meanwhile the membrane remains hyperpo-larized and due to its increased permeability to potassium ions at lowerresistance These effects make it difcult to induce a second spike duringthis raquo15 msec ldquorefractory periodrdquo Away from spikes the resting levels anductuations of the voltage and gating variables are quite small The largervalues evident in Figure 1(middle and bottom) by sect15 msec are due to thesummed contributions of nearby spikes

The spike-triggered average current has a largely transient form so thatspikes are on average preceded by an upward swing in current On the

1728 B Aguera y Arcas A Fairhall and W Bialek

Figure 1 Spike-triggered averages with standard deviations for (top) the inputcurrent I (middle) the fraction of open KC and NaC channels and (bottom) themembrane voltage V for the parameter regime I0 D 0 and S D 650 pound 10iexcl4 nA2

sec

other hand there is no obvious bottleneck in the current trajectories sothat the current variance is almost constant throughout the spike This isqualitatively consistent with the idea of dimensionality reduction if theneuron ignores most of the dimensions along which the current can varythen the variance which is shared almost equally among all dimensions forthis near white noise can change by only a small amount

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 2: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1716 B Aguera y Arcas A Fairhall and W Bialek

1 Introduction

On short timescales one can conceive of a single neuron as a computationaldevice that maps inputs at its synapses into a sequence of action poten-tials or spikes To a good approximation the dynamics of this mapping aredetermined by the kinetic properties of ion channels in the neuronrsquos mem-brane In the 50 years since the pioneering work of Hodgkin and Huxleywe have seen the evolution of an ever more detailed description of chan-nel kinetics making it plausible that the short time dynamics of almostany neuron we encounter will be understandable in terms of interactionsamong a mixture of diverse but known channel types (Hille 1992 Koch1999) The existence of so nearly complete a microscopic picture of single-neuron dynamics brings into focus a very different question What does theneuron compute Although models in the HodgkinndashHuxley (HH) traditiondene a dynamical system that will reproduce the behavior of the neuronthis description in terms of differential equations is far from our intuitionaboutmdashor the formal description ofmdashcomputation

The problem of what neurons compute is one instance of a more generalproblem in modern quantitative biology and biophysics Given a progres-sively more complete microscopic description of proteins and their inter-actions how do we understand the emergence of function In the case ofneurons the proteins are the ion channels and the interactions are verysimple current ows through open channels charging the cellrsquos capaci-tance and all channels experience the resulting voltage Arguably there isno other network of interacting proteins for which the relevant equationsare known in such detail indeed some efforts to understand function andcomputation in other networks of proteins make use of analogies to neuralsystems (Bray 1995) Despite the relative completeness of our microscopicpicture for neurons there remains a huge gap between the description ofmolecular kinetics and the understanding of function Given some complexdynamic input to a neuron we might be able to simulate the spike trainthat will result but we are hard pressed to look at the equations for channelkinetics and say that this transformation from inputs to spikes is equivalentto some simple (or perhaps not so simple) computation such as lteringthresholding coincidence detection or feature extraction

Perhaps the problem of understanding computational function in amodel of ion channel dynamics is a symptom of a much deeper mathemati-cal difculty Despite the fact that all computers are dynamical systems thenatural mathematical objects in dynamical systems theory are very differentfrom those in the theory of computation and it is not clear how to connectthese different formal schemes Finding a general mapping from dynamicalsystems to their equivalent computational functions is a grand challengebut we will take a more modest approach

We believe that a key intuition for understanding neural computation isthe concept of feature selectivity while the space of inputs to a neuronmdash

Computation in a Single Neuron 1717

whether we think of inputs as arriving at the synapses or being driven bysensory signals outside the brainmdashis vast individual neurons are sensitiveonly to some restricted set of features in this vast space The most gen-eral way to formalize this intuition is to say that we can compress (in theinformation-theoretic sense) our description of the inputs without losingany information about the neural output (Tishby Pereira amp Bialek 1999)We might hope that this selective compression of the input data has a simplegeometric description so that the relevant bits about the input correspondto coordinates along some restricted set of relevant dimensions in the spaceof inputs If this is the case feature selectivity should be formalized as areduction of dimensionality (de Ruyter van Steveninck amp Bialek 1988) andthis is the approach we follow here Closely related work on the use ofdimensionality reduction to analyze neural feature selectivity has been de-scribed in recent work (Bialek amp de Ruyter van Steveninck 2003 SharpeeRust amp Bialek in press)

Here we develop the idea of dimensionality reduction as a tool for anal-ysis of neural computation and apply these tools to the HH model Whileour initial goal was to test new analysis methods in the context of a presum-ably simple and well-understood model we have found that the HH neu-ron performs a computation of surprising richness Preliminary accounts ofthese results have already appeared (Aguera y Arcas 1998 Aguera y ArcasBialek amp Fairhall 2001)

2 Dimensionality Reduction

Neurons take input signals at their synapses and give as output sequencesof spikes To characterize a neuron completely is to identify the mappingbetween neuronal input and the spike train the neuron produces inresponseIn the absence of any simplifying assumptions this requires probing thesystem with every possible input Most often these inputs are spikes fromother neurons each neuron typically has of order N raquo 103 presynapticconnections If the system operates at 1 msec resolutionand the timewindowof relevant inputs is 40 msec then we can think of a single neuron as havingan input described by a raquo 4 pound 104 bit wordmdashthe presence or absence of aspike in each 1 msec bin for each presynaptic cellmdashwhich is then mapped toa one (spike) or zero (no spike) More realistically if average spike rates areraquo 10 siexcl1 the input words can be compressed by a factor of 10 In this picturea neuron computes a Boolean function over roughly 4000 variables Clearlyone cannot sample every one of the raquo 24000 inputs to identify the neuralcomputation Progress requires making some simplifyingassumption aboutthe function computed by the neuron so that we can vastly reduce the spaceof possibilities over which to search We use the idea of dimensionalityreduction in this spirit as a simplifying assumption that allows us to makeprogress but that also must be tested directly

1718 B Aguera y Arcas A Fairhall and W Bialek

The ideas of feature selectivity and dimensionality reduction have a longhistory in neurobiology The idea of receptive elds as formulated by Hart-line Kufer and Barlow for the visual system gave a picture of neuronsas having a template against which images would be correlated (Hartline1940 Kufer 1953 Barlow 1953) If we think of images as vectors in ahigh-dimensional space with coordinates determined by the intensities ofeach pixel then the simplest receptive eld models describe the neuronas sensitive to only one direction or projection in this high-dimensionalspace This picture of projection followed by thresholding or some othernonlinearity to determine the probability of spike generation was formal-ized in the linear perceptron (Rosenblatt 1958 1962) In subsequent workBarlow Hill and Levick (1964) characterized neurons in which the recep-tive eld has subregions in space and time such that summation is at leastapproximately linear in each subregion but these summed signals inter-act nonlinearly for example to generate direction selectivity and motionsensitivity We can think of Hubel and Wieselrsquos description of complexand hypercomplex cells (Hubel amp Wiesel 1962) again as a picture of ap-proximately linear summation within subregions followed by nonlinearoperations on these multiple summed signals More formally the propercombination of linear summation and nonlinear or logical operations mayprovide a useful bridge from receptive eld properties to proper geometricprimitives in visual computation (Iverson amp Zucker 1995) In the same waythat a single receptive eld or perceptron model has one relevant dimen-sion in the space of visual stimuli these more complex cells have as manyrelevant dimensions as there are independent subregions of the receptiveeld Although this number is larger than one it still is much smaller thanthe full dimensionality of the possible spatiotemporal variations in visualinputs

The idea that neurons in the auditory system might be described by alter followed by a nonlinear transformation to determine the probabilityof spike generation was the inspiration for de Boerrsquos development (de Boeramp Kuyper 1968) of triggered or reverse correlation Modern uses of reversecorrelation to characterize the ltering or receptive eld properties of a neu-ron often emphasize that this approach provides a ldquolinear approximationrdquoto the input-output properties of the cell but the original idea was almostthe opposite neurons clearly are nonlinear devices but this is separate fromthe question of whether the probability of generating a spike is determinedby a simple projection of the sensory input onto a single lter or template Infact as explained by Rieke Warland Bialek and de Ruyter van Steveninck(1997) linearity is seldom a good approximation for the neural input-outputrelation but if there is one relevant dimension then (provided that inputsignals are chosen with suitable statistics) the reverse correlation method isguaranteed to nd this one special direction in the space of inputs to whichthe neuron is sensitive While the reverse correlation method is guaranteedto nd the one relevant dimension if it exists the method does not include

Computation in a Single Neuron 1719

any way of testing for other relevant dimensions or more generally formeasuring the dimensionality of the relevant subspace

The idea of characterizing neural responses directly as the reduction ofdimensionality emerged from studies (de Ruyter van Steveninck amp Bialek1988) of a motion-sensitive neuron in the y visual system In particularthis work suggested that it is possible to estimate the dimensionality of therelevant subspace rather than just assuming that it is small (or equal to one)More recent work on the y visual system has exploited the idea of dimen-sionality reduction to probe both the structure and adaptation of the neuralcode (Brenner Bialek amp de Ruyter van Steveninck 2000 Fairhall LewenBialek amp de Ruyter van Steveninck 2001) and the nature of the computationthat extracts the motion signal from the spatiotemporal array of photore-ceptor inputs (Bialek amp de Ruyter van Steveninck 2003) Here we reviewthe ideas of dimensionality reduction from previous work extensions ofthese ideas begin in section 3

In the spirit of neural network models we will simplify away the spatialstructure of neurons and consider time-dependent currents It injected intoa pointndashlike neuron While this misses much of the complexity of real cellswe will nd that even this system is highly nontrivial If the input is aninjected current then the neuron maps the history of this current It lt t0into the presence or absence of a spike at time t0 More generally we mightimagine that the cell (or our description) isnoisy so that there is a probabilityof spiking P[spike at t0 j It lt t0] that depends on the current history Thedependence on the history of the current means that the input signal stillis high dimensional even without spatial dependence Working at timeresolution 1t and assuming that currents in a window of size T are relevantto the decision to spike the input space is of dimension D D T=1t whereD is often of order 100

The idea of dimensionality reduction is that the probability of spike gen-eration is sensitive only to some limited number of dimensions K withinthe D-dimensional space of inputs We begin our analysis by searching forlinear subspaces that is a set of signals s1 s2 sK that can be constructedby ltering the current

ssup1 DZ 1

0dt fsup1tIt0 iexcl t (21)

so that the probability of spiking depends on only this small set of signals

P[spike at t0 j It lt t0] D P[spike at t0]gs1 s2 sK (22)

where the inclusion of the average probability of spiking P[spike at t0]leaves g dimensionless If we think of the current It0 iexcl T lt t lt t0 asa D-dimensional vector with one dimension for each discrete sample atspacing 1t then the ltered signals si are linear projections of this vector In

1720 B Aguera y Arcas A Fairhall and W Bialek

this formulation characterizing the computation done by a neuron involvesthree steps

1 Estimate the number of relevant stimulus dimensions K with the hopethat there will be many fewer than the original dimensionality D

2 Identify a set of lters that project into this relevant subspace

3 Characterize the nonlinear function gEs

The classical perceptronndashlike cell of neural network theory would have onlyone relevant dimension given by the vector of weights and a simple formfor g typically a sigmoid

Rather than trying to lookdirectly at the distribution of spikes given stim-uli we follow de Ruyter van Steveninck and Bialek (1988) and consider thedistribution of signals conditional on the response P[It lt t0 j spike at t0]also called the response conditional ensemble (RCE) these are related byBayesrsquo rule

P[spike at t0 j It lt t0]P[spike at t0]

D P[It lt t0 j spike at t0]P[It lt t0]

(23)

We can now compute various moments of the RCE The rst moment is thespike-triggered average stimulus (STA)

STAiquest DZ

[dI]P[It lt t0 j spike at t0]It0 iexcl iquest (24)

which is the object that one computes in reverse correlation (de Boer ampKuyper 1968 Rieke et al 1997) If we choose the distribution of inputstimuli P[It lt t0] to be gaussian white noise then for a perceptronndashlikeneuron sensitive to only one direction in stimulus space it can be shownthat the STA or rst moment of the RCE is proportional to the vector or lterf iquest that denes this direction (Rieke et al 1997)

Although it is a theorem that the STA is proportional to the relevant l-ter f iquest in principle it is possible that the proportionality constant is zeromost plausibly if the neuronrsquos response has some symmetry such as phaseinvariance in the response of high-frequency auditory neurons It also isworth noting that what is really important in this analysis is the gaussiandistribution of the stimuli not the ldquowhitenessrdquo of the spectrum For non-white but gaussian inputs the STA measures the relevant lter blurred bythe correlation function of the inputs and hence the true lter can be recov-ered (at least in principle) by deconvolution For nongaussian signals andnonlinear neurons there is no corresponding guarantee that the selectivityof the neuron can be separated from correlations in the stimulus (Sharpeeet al in press)

To obtain more than one relevant direction (or to reveal relevant direc-tions when symmetries cause the STA to vanish) we proceed to second

Computation in a Single Neuron 1721

order and compute the covariance matrix of uctuations around the spike-triggered average

Cspikeiquest iquest 0 DZ

[dI]P[It lt t0 j spike at t0]It0 iexcl iquest It0 iexcl iquest 0

iexcl STAiquest STAiquest 0 (25)

In the same way that we compare the spike-triggered average to some con-stant average level of the signal in the whole experiment we compare thecovariance matrix Cspike with the covariance of the signal averaged over thewhole experiment

Cprioriquest iquest 0 DZ

[dI]P[It lt t0]It0 iexcl iquest It0 iexcl iquest 0 (26)

to construct the change in the covariance matrix

1C D Cspike iexcl Cprior (27)

With time resolution 1t in a window of duration T as above all of thesecovariances are DpoundD matrices In the same way that the spike-triggered av-erage has the clearest interpretation when we choose inputs from a gaussiandistribution 1C also has the clearest interpretation in this case Specicallyif inputs are drawn from a gaussian distribution then it can be shown that(Bialek amp de Ruyter van Steveninck 2003)

1 If the neuron is sensitive to a limited set of K-input dimensions as inequation 22 then 1C will have only K nonzero eigenvalues1 In thisway we can measure directly the dimensionality K of the relevantsubspace

2 If the distribution of inputs is both gaussian and white then the eigen-vectors associated with the nonzero eigenvalues span the same spaceas that spanned by the lters f fsup1iquest g

3 For nonwhite (correlated) but still gaussian inputs the eigenvectorsspan the space of the lters f fsup1iquest g blurred by convolution with thecorrelation function of the inputs

Thus the analysis of 1C for neurons responding to gaussian inputs shouldallow us to identify the subspace of inputs of relevance and test specicallythe hypothesis that this subspace is of low dimension

1 As with the STA it is in principle possible that symmetries or accidental featuresof the function gEs would cause some of the K eigenvalues to vanish but this is veryunlikely

1722 B Aguera y Arcas A Fairhall and W Bialek

Several points are worth noting First except in special cases the eigen-vectors of 1C and the lters f fsup1iquest g are not the principal components of theRCE and hence this analysis of 1C is not a principal component analysisSecond the nonzero eigenvalues of 1C can be either positive or negativedepending on whether the variance of inputs along that particular direc-tion is larger or smaller in the neighborhood of a spike Third although theeigenvectors span the relevant subspace these eigenvectors do not form apreferred coordinate system within this subspace Finally we emphasizethat dimensionality reductionmdashidentication of the relevant subspacemdashisonly the rst step in our analysis of the computation done by a neuron

3 Measuring the Success of Dimensionality Reduction

The claim that certain stimulus features are most relevant is in effect a modelfor the neuron so the next question is how to measure the effectiveness oraccuracy of this model Several different ideas have been suggested in theliterature as ways of testing models based on linear receptive elds in thevisual system (Stanley Lei amp Dan 1999 Keat Reinagel Reid amp Meister2001) or linear spectrotemporal receptive elds in the auditory system (The-unissen Sen amp Doupe 2000) These methods have in common that they in-troduce a metric to measure performancemdashfor example mean square errorin predicting the ring rate as averaged over some window of time Ideallywe would like to have a performance measure that avoids any arbitrari-ness in the choice of metric and such metric-free measures are provideduniquely by information theory (Shannon 1948 Cover amp Thomas 1991)

Observing the arrival time t0 of a single spikeprovidesa certainamount ofinformationabout the input signals Since information ismutual we can alsosay that knowing the input signal trajectory It lt t0 provides informationabout the arrival time of the spike If ldquodetails are irrelevantrdquo then we shouldbe able to discard these details from our description of the stimulus and yetpreserve the mutual information between the stimulus and spike arrivaltimes (for an abstract discussion of such selective compression see Tishbyet al 1999) In constructing our low-dimensional model we represent thecomplete (D-dimensional) stimulus It lt t0 by a smaller number (K lt D)of dimensions Es D s1 s2 sK

The mutual information I[It lt t0I t0] is a property of the neuron itselfwhile the mutual information I[EsI t0] characterizes how much our reduceddescription of the stimulus can tell us about when spikes will occur Nec-essarily our reduction of dimensionality causes a loss of information sothat

I[EsI t0] middot I[It lt t0I t0] (31)

but if our reduced description really captures the computation done by theneuron then the two information measures will be very close In particular

Computation in a Single Neuron 1723

if the neuron were described exactly by a lower-dimensional modelmdashas fora linear perceptron or for an integrate-and-re neuron (Aguera y Arcas ampFairhall 2003)mdashthen the two information measures would be equal Moregenerally the ratio I[EsI t0]=I[It lt t0I t0] quanties the efciency of thelow-dimensional model measuring the fraction of information about spikearrival times that our K dimensions capture from the full signal It lt t0

As shown by Brenner Strong Koberle Bialek and de Ruyter van Steven-inck (2000) the arrival time of a single spike provides an information

I[It lt t0I t0] acute Ione spike D 1T

Z T

0dt

rtNr

log2

micrort

Nr

para (32)

where rt is the time-dependent spike rate Nr is the average spike rateand hcent cent centi denotes an average over time In principle information should becalculated as an average over the distribution of stimuli but the ergodicityof the stimulus justies replacing this ensemble average with a time averageFor a deterministic system like the HH equations the spike rate is a singularfunction of time given the inputs It spikes occur at denite times with norandomness or irreproducibility If we observe these responses with a timeresolution 1t then for 1t sufciently small the rate rt at any time t eitheris zero or corresponds to a single spike occurring in one bin of size 1t thatis r D 1=1t Thus the information carried by a single spike is

Ione spike D iexcl log2 Nr1t (33)

On the other hand if the probability of spiking really depends on only thestimulus dimensions s1 s2 sK we can substitute

rtNr

PEs j spike at tPEs

(34)

Replacing the time averages in equation 32 with ensemble averages wend

I[EsI t0]acute IEsone spike D

ZdKsPEs j spike at t log2

microPEs j spike at t

PEs

para(35)

(for details of these arguments see Brenner Strong et al 2000) This al-lows us to compare the information captured by the K-dimensional reducedmodel with the true information carried by single spikes in the spike train

For reasons that we will discuss in the following section and as waspointed out in Aguera y Arcas et al (2001) and Aguera y Arcas and Fairhall(2003) we will be considering isolated spikesmdashthose separated from pre-vious spikes by a period of silence This has important consequences forour analysis Most signicantly as we will be considering spikes that occur

1724 B Aguera y Arcas A Fairhall and W Bialek

on a background of silence the relevant stimulus ensemble conditionedon the silence is no longer gaussian Further we will need to rene ourinformation estimate

The derivation of equation 32 makes clear that a similar formula mustdetermine the information carried by the occurrence time of any event notjust single spikes we can dene an event rate in place of the spike rate andthen calculate the information carried by these events (Brenner Strong etal 2000) In the case here we wish to compute the information obtainedby observing an isolated spike or equivalently by the event silence+spikeThis is straightforward we replace the spike rate by the rate of isolatedspikes and equation 32 will give us the information carried by the arrivaltime of a single isolated spike The problem is that this information includesboth the information carried by the occurrence of the spike and the infor-mation conveyed in the condition that there were no spikes in the precedingtsilence msec (for an early discussion of the information carried by silencesee de Ruyter van Steveninck amp Bialek 1988) We would like to separatethese contributions since our idea of dimensionality reduction applies onlyto the triggering of a spike not to the temporally extended condition ofnonspiking

To separate the information carried by the isolated spike itself we haveto ask how much information we gain by seeing an isolated spike given thatthe condition for isolation has already been met As discussed by BrennerStrong et al (2000) we can compute this information by thinking about thedistribution of times at which the isolated spike can occur Given that weknow the input stimulus the distribution of times at which a single isolatedspike will be observed is proportional to risot the time-dependent rate orperistimulus time histogram for isolated spikes With propernormalizationwe have

Pisot j inputs D1T

cent1

Nrisorisot (36)

where T is duration of the (long) window in which we can look for thespike and Nriso is the average rate of isolated spikes This distribution has anentropy

Sisot j inputs D iexclZ T

0dt Pisot j inputs log2 Pisot j inputs (37)

D iexcl1T

Z T

0dt

risotNriso

log2

micro1T

cent risotNriso

para(38)

D log2TNriso1t bits (39)

where again we use the fact that for a deterministic system the time-dependent rate must be either zero or the maximum allowed by our time

Computation in a Single Neuron 1725

resolution 1t To compute the information carried by a single spike weneed to compare this entropy with the total entropy possible when we donot know the inputs

It is tempting to think that without knowledge of the inputs an isolatedspike is equally likely to occur anywhere in the window of size T whichleads us back to equation 33 with Nr replaced by Nriso In this case howeverwe are assuming that the condition for isolation has already been met Thuseven without observing the inputs we know that isolated spikes can occuronly in windows of time whose total length is Tsilence D T cent Psilence wherePsilence is the probability that any moment in time is at least tsilence after themost recent spike Thus the total entropy of isolated spike arrival times(given that the condition for silence has been met) is reduced from log2 T to

Sisot j silence D log2T cent Psilence (310)

and the information that the spike carries beyond what we know from thesilence itself is

1Iiso spike D Sisot j silence iexcl Sisot j inputs (311)

D1T

Z T

0dt

risotNriso

log2

microrisot

Nrisocent Psilence

para(312)

D iexcl log2Nriso1t C log2 Psilence bits (313)

This information which is dened independent of any model for the fea-ture selectivity of the neuron provides the benchmark against which ourreduction of dimensionality will be measured To make the comparisonhowever we need the analog of equation 35

Equation 312 provides us with an expression for the information con-veyed by isolated spikes in terms of the probability that these spikes occurat particular times this is analogous to equation 32 for single (nonisolated)spikes If we follow a path analogous to that which leads from equation 32to equation 35 we nd an expression for the information that an isolatedspike provides about the K stimulus dimensions Es

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para

C hlog2 Psilence j Esi (314)

where the prior is now also conditioned on silence PEs j silence is thedistribution of Es given that Es is preceded by a silence of at least tsilence Noticethat this silence-conditioned distribution is not knowable a priori and inparticular it is not gaussian PEs j silence must be sampled from data

The last term in equation 314 is the entropy of a binary variable thatindicates whether particular moments in time are silent given knowledge

1726 B Aguera y Arcas A Fairhall and W Bialek

of the stimulus Again since the HH model is deterministic this conditionalentropy should be zero if we keep a complete description of the stimulusIn fact we are not interested in describing those features of the stimulusthat lead to silence and it is not fair (as we will see) to judge the success ofdimensionality reduction by looking at the prediction of silence which nec-essarily involves multiple dimensions To make a meaningful comparisonthen we will assume that there is a perfect description of the stimulus con-ditions leading to silence and focus on the stimulus features that trigger theisolated spike When we approximate these features by the K-dimensionalspace Es we capture an amount of information

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para (315)

This is the information that we can compare with 1Iiso spike in equation 313to determine the efciency of our dimensionality reduction

4 Characterizing the Hodgkin-Huxley Neuron

For completeness we begin with a brief review of the dynamics of thespace-clamped HHneuron (Hodgkin amp Huxley 1952) Hodgkin and Huxleymodeled the dynamics of the current through a patch of membrane withion-specic conductances

CdVdt

D It iexcl NgKn4V iexcl VK iexcl NgNam3hV iexcl VNa iexcl NglV iexcl Vl (41)

where It is injected current K and Na subscripts denote potassiumndash andsodiumndashrelated variables respectively and l (for ldquoleakagerdquo) terms includeall other ion conductances with slower dynamics C is the membrane ca-pacitance VK and VNa are ion-specic reversal potentials and Vl is denedsuch that the total voltage V is exactly zero when the membrane is at restNgK NgNa and Ngl are empirically determined maximal conductances for thedifferent ion species and the gating variables n m and h (on the interval[0 1]) have their own voltage-dependent dynamics

dn=dt D 001V C 011 iexcl n expiexcl01V iexcl 0125n expV=80

dm=dt D 01V C 251 iexcl m expiexcl01V iexcl 15 iexcl 4m expV=18

dh=dt D 0071 iexcl h exp005V iexcl h expiexcl01V iexcl 4 (42)

We have used the original values for these parameters except for changingthe signs of the voltages to correspond to the modern sign convention C D 1sup1Fcm2 NgK D 36 mScm2 NgNa D 120 mScm2 Ngl D 03 mScm2 VK D iexcl12mV VNa D C115 mV Vl D C10613 mV We have taken our system to be

Computation in a Single Neuron 1727

a frac14 pound 302 sup1m2 patch of membrane We solve these equations numericallyusing fourth-order RungendashKutta integration

The system is driven with a gaussian random noise current It gener-ated by smoothing a gaussian random number stream with an exponentiallter to generate a correlation time iquest It is convenient to choose iquest to be longerthan the time steps of numerical integration since this guarantees that allfunctions are smooth on the scale of single time steps Here we will alwaysuse iquest D 02 msec a value that is both less than the timescale over whichwe discretize the stimulus for analysis and far less than the neuronrsquos ca-pacitative smoothing timescale RC raquo 3 msec It has a standard deviationfrac34 but since the correlation time is short the relevant parameter usually isthe spectral density S D frac34 2iquest we also add a DC offset I0 In the followingwe will consider two parameter regimes I0 D 0 and I0 a nite value whichleads to more periodic ring

The integration step size is xed at 005 msec The key numerical exper-iments were repeated at a step size of 001 msec with identical results Thetime of a spike is dened as the moment of maximum voltage for voltagesexceeding a threshold (see Figure 1) estimated to subsample precision byquadratic interpolation As spikes are both very stereotyped and very largecompared to subspiking uctuations the precise value of this threshold isunimportant we have used C20 mV

41 Qualitative Description of Spiking The rst step in our analysis isto use reverse correlation equation 24 to determine the average stimulusfeature preceding a spike the STA In Figure 1(top) we display the STAin a regime where the spectral density of the input current is 65 pound 10iexcl4

nA2 msec The spike-triggered averages of the gating terms n4 (proportionof open potassium channels) and m3h (proportion of open sodium chan-nels) and the membrane voltage V are plotted in Figure 1 (middle and bot-tom) The error bars mark the standard deviation of the trajectories of thesevariables

As expected the voltage and gating variables follow highly stereotypedtrajectories during the raquo5 msec surrounding a spike First the rapid open-ing of the sodium channels causes a sharp membrane depolarization (orrise in V) the slower potassium channels then open and repolarize themembrane leaving it at a slightly lower potential than rest The potassiumchannels close gradually but meanwhile the membrane remains hyperpo-larized and due to its increased permeability to potassium ions at lowerresistance These effects make it difcult to induce a second spike duringthis raquo15 msec ldquorefractory periodrdquo Away from spikes the resting levels anductuations of the voltage and gating variables are quite small The largervalues evident in Figure 1(middle and bottom) by sect15 msec are due to thesummed contributions of nearby spikes

The spike-triggered average current has a largely transient form so thatspikes are on average preceded by an upward swing in current On the

1728 B Aguera y Arcas A Fairhall and W Bialek

Figure 1 Spike-triggered averages with standard deviations for (top) the inputcurrent I (middle) the fraction of open KC and NaC channels and (bottom) themembrane voltage V for the parameter regime I0 D 0 and S D 650 pound 10iexcl4 nA2

sec

other hand there is no obvious bottleneck in the current trajectories sothat the current variance is almost constant throughout the spike This isqualitatively consistent with the idea of dimensionality reduction if theneuron ignores most of the dimensions along which the current can varythen the variance which is shared almost equally among all dimensions forthis near white noise can change by only a small amount

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 3: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1717

whether we think of inputs as arriving at the synapses or being driven bysensory signals outside the brainmdashis vast individual neurons are sensitiveonly to some restricted set of features in this vast space The most gen-eral way to formalize this intuition is to say that we can compress (in theinformation-theoretic sense) our description of the inputs without losingany information about the neural output (Tishby Pereira amp Bialek 1999)We might hope that this selective compression of the input data has a simplegeometric description so that the relevant bits about the input correspondto coordinates along some restricted set of relevant dimensions in the spaceof inputs If this is the case feature selectivity should be formalized as areduction of dimensionality (de Ruyter van Steveninck amp Bialek 1988) andthis is the approach we follow here Closely related work on the use ofdimensionality reduction to analyze neural feature selectivity has been de-scribed in recent work (Bialek amp de Ruyter van Steveninck 2003 SharpeeRust amp Bialek in press)

Here we develop the idea of dimensionality reduction as a tool for anal-ysis of neural computation and apply these tools to the HH model Whileour initial goal was to test new analysis methods in the context of a presum-ably simple and well-understood model we have found that the HH neu-ron performs a computation of surprising richness Preliminary accounts ofthese results have already appeared (Aguera y Arcas 1998 Aguera y ArcasBialek amp Fairhall 2001)

2 Dimensionality Reduction

Neurons take input signals at their synapses and give as output sequencesof spikes To characterize a neuron completely is to identify the mappingbetween neuronal input and the spike train the neuron produces inresponseIn the absence of any simplifying assumptions this requires probing thesystem with every possible input Most often these inputs are spikes fromother neurons each neuron typically has of order N raquo 103 presynapticconnections If the system operates at 1 msec resolutionand the timewindowof relevant inputs is 40 msec then we can think of a single neuron as havingan input described by a raquo 4 pound 104 bit wordmdashthe presence or absence of aspike in each 1 msec bin for each presynaptic cellmdashwhich is then mapped toa one (spike) or zero (no spike) More realistically if average spike rates areraquo 10 siexcl1 the input words can be compressed by a factor of 10 In this picturea neuron computes a Boolean function over roughly 4000 variables Clearlyone cannot sample every one of the raquo 24000 inputs to identify the neuralcomputation Progress requires making some simplifyingassumption aboutthe function computed by the neuron so that we can vastly reduce the spaceof possibilities over which to search We use the idea of dimensionalityreduction in this spirit as a simplifying assumption that allows us to makeprogress but that also must be tested directly

1718 B Aguera y Arcas A Fairhall and W Bialek

The ideas of feature selectivity and dimensionality reduction have a longhistory in neurobiology The idea of receptive elds as formulated by Hart-line Kufer and Barlow for the visual system gave a picture of neuronsas having a template against which images would be correlated (Hartline1940 Kufer 1953 Barlow 1953) If we think of images as vectors in ahigh-dimensional space with coordinates determined by the intensities ofeach pixel then the simplest receptive eld models describe the neuronas sensitive to only one direction or projection in this high-dimensionalspace This picture of projection followed by thresholding or some othernonlinearity to determine the probability of spike generation was formal-ized in the linear perceptron (Rosenblatt 1958 1962) In subsequent workBarlow Hill and Levick (1964) characterized neurons in which the recep-tive eld has subregions in space and time such that summation is at leastapproximately linear in each subregion but these summed signals inter-act nonlinearly for example to generate direction selectivity and motionsensitivity We can think of Hubel and Wieselrsquos description of complexand hypercomplex cells (Hubel amp Wiesel 1962) again as a picture of ap-proximately linear summation within subregions followed by nonlinearoperations on these multiple summed signals More formally the propercombination of linear summation and nonlinear or logical operations mayprovide a useful bridge from receptive eld properties to proper geometricprimitives in visual computation (Iverson amp Zucker 1995) In the same waythat a single receptive eld or perceptron model has one relevant dimen-sion in the space of visual stimuli these more complex cells have as manyrelevant dimensions as there are independent subregions of the receptiveeld Although this number is larger than one it still is much smaller thanthe full dimensionality of the possible spatiotemporal variations in visualinputs

The idea that neurons in the auditory system might be described by alter followed by a nonlinear transformation to determine the probabilityof spike generation was the inspiration for de Boerrsquos development (de Boeramp Kuyper 1968) of triggered or reverse correlation Modern uses of reversecorrelation to characterize the ltering or receptive eld properties of a neu-ron often emphasize that this approach provides a ldquolinear approximationrdquoto the input-output properties of the cell but the original idea was almostthe opposite neurons clearly are nonlinear devices but this is separate fromthe question of whether the probability of generating a spike is determinedby a simple projection of the sensory input onto a single lter or template Infact as explained by Rieke Warland Bialek and de Ruyter van Steveninck(1997) linearity is seldom a good approximation for the neural input-outputrelation but if there is one relevant dimension then (provided that inputsignals are chosen with suitable statistics) the reverse correlation method isguaranteed to nd this one special direction in the space of inputs to whichthe neuron is sensitive While the reverse correlation method is guaranteedto nd the one relevant dimension if it exists the method does not include

Computation in a Single Neuron 1719

any way of testing for other relevant dimensions or more generally formeasuring the dimensionality of the relevant subspace

The idea of characterizing neural responses directly as the reduction ofdimensionality emerged from studies (de Ruyter van Steveninck amp Bialek1988) of a motion-sensitive neuron in the y visual system In particularthis work suggested that it is possible to estimate the dimensionality of therelevant subspace rather than just assuming that it is small (or equal to one)More recent work on the y visual system has exploited the idea of dimen-sionality reduction to probe both the structure and adaptation of the neuralcode (Brenner Bialek amp de Ruyter van Steveninck 2000 Fairhall LewenBialek amp de Ruyter van Steveninck 2001) and the nature of the computationthat extracts the motion signal from the spatiotemporal array of photore-ceptor inputs (Bialek amp de Ruyter van Steveninck 2003) Here we reviewthe ideas of dimensionality reduction from previous work extensions ofthese ideas begin in section 3

In the spirit of neural network models we will simplify away the spatialstructure of neurons and consider time-dependent currents It injected intoa pointndashlike neuron While this misses much of the complexity of real cellswe will nd that even this system is highly nontrivial If the input is aninjected current then the neuron maps the history of this current It lt t0into the presence or absence of a spike at time t0 More generally we mightimagine that the cell (or our description) isnoisy so that there is a probabilityof spiking P[spike at t0 j It lt t0] that depends on the current history Thedependence on the history of the current means that the input signal stillis high dimensional even without spatial dependence Working at timeresolution 1t and assuming that currents in a window of size T are relevantto the decision to spike the input space is of dimension D D T=1t whereD is often of order 100

The idea of dimensionality reduction is that the probability of spike gen-eration is sensitive only to some limited number of dimensions K withinthe D-dimensional space of inputs We begin our analysis by searching forlinear subspaces that is a set of signals s1 s2 sK that can be constructedby ltering the current

ssup1 DZ 1

0dt fsup1tIt0 iexcl t (21)

so that the probability of spiking depends on only this small set of signals

P[spike at t0 j It lt t0] D P[spike at t0]gs1 s2 sK (22)

where the inclusion of the average probability of spiking P[spike at t0]leaves g dimensionless If we think of the current It0 iexcl T lt t lt t0 asa D-dimensional vector with one dimension for each discrete sample atspacing 1t then the ltered signals si are linear projections of this vector In

1720 B Aguera y Arcas A Fairhall and W Bialek

this formulation characterizing the computation done by a neuron involvesthree steps

1 Estimate the number of relevant stimulus dimensions K with the hopethat there will be many fewer than the original dimensionality D

2 Identify a set of lters that project into this relevant subspace

3 Characterize the nonlinear function gEs

The classical perceptronndashlike cell of neural network theory would have onlyone relevant dimension given by the vector of weights and a simple formfor g typically a sigmoid

Rather than trying to lookdirectly at the distribution of spikes given stim-uli we follow de Ruyter van Steveninck and Bialek (1988) and consider thedistribution of signals conditional on the response P[It lt t0 j spike at t0]also called the response conditional ensemble (RCE) these are related byBayesrsquo rule

P[spike at t0 j It lt t0]P[spike at t0]

D P[It lt t0 j spike at t0]P[It lt t0]

(23)

We can now compute various moments of the RCE The rst moment is thespike-triggered average stimulus (STA)

STAiquest DZ

[dI]P[It lt t0 j spike at t0]It0 iexcl iquest (24)

which is the object that one computes in reverse correlation (de Boer ampKuyper 1968 Rieke et al 1997) If we choose the distribution of inputstimuli P[It lt t0] to be gaussian white noise then for a perceptronndashlikeneuron sensitive to only one direction in stimulus space it can be shownthat the STA or rst moment of the RCE is proportional to the vector or lterf iquest that denes this direction (Rieke et al 1997)

Although it is a theorem that the STA is proportional to the relevant l-ter f iquest in principle it is possible that the proportionality constant is zeromost plausibly if the neuronrsquos response has some symmetry such as phaseinvariance in the response of high-frequency auditory neurons It also isworth noting that what is really important in this analysis is the gaussiandistribution of the stimuli not the ldquowhitenessrdquo of the spectrum For non-white but gaussian inputs the STA measures the relevant lter blurred bythe correlation function of the inputs and hence the true lter can be recov-ered (at least in principle) by deconvolution For nongaussian signals andnonlinear neurons there is no corresponding guarantee that the selectivityof the neuron can be separated from correlations in the stimulus (Sharpeeet al in press)

To obtain more than one relevant direction (or to reveal relevant direc-tions when symmetries cause the STA to vanish) we proceed to second

Computation in a Single Neuron 1721

order and compute the covariance matrix of uctuations around the spike-triggered average

Cspikeiquest iquest 0 DZ

[dI]P[It lt t0 j spike at t0]It0 iexcl iquest It0 iexcl iquest 0

iexcl STAiquest STAiquest 0 (25)

In the same way that we compare the spike-triggered average to some con-stant average level of the signal in the whole experiment we compare thecovariance matrix Cspike with the covariance of the signal averaged over thewhole experiment

Cprioriquest iquest 0 DZ

[dI]P[It lt t0]It0 iexcl iquest It0 iexcl iquest 0 (26)

to construct the change in the covariance matrix

1C D Cspike iexcl Cprior (27)

With time resolution 1t in a window of duration T as above all of thesecovariances are DpoundD matrices In the same way that the spike-triggered av-erage has the clearest interpretation when we choose inputs from a gaussiandistribution 1C also has the clearest interpretation in this case Specicallyif inputs are drawn from a gaussian distribution then it can be shown that(Bialek amp de Ruyter van Steveninck 2003)

1 If the neuron is sensitive to a limited set of K-input dimensions as inequation 22 then 1C will have only K nonzero eigenvalues1 In thisway we can measure directly the dimensionality K of the relevantsubspace

2 If the distribution of inputs is both gaussian and white then the eigen-vectors associated with the nonzero eigenvalues span the same spaceas that spanned by the lters f fsup1iquest g

3 For nonwhite (correlated) but still gaussian inputs the eigenvectorsspan the space of the lters f fsup1iquest g blurred by convolution with thecorrelation function of the inputs

Thus the analysis of 1C for neurons responding to gaussian inputs shouldallow us to identify the subspace of inputs of relevance and test specicallythe hypothesis that this subspace is of low dimension

1 As with the STA it is in principle possible that symmetries or accidental featuresof the function gEs would cause some of the K eigenvalues to vanish but this is veryunlikely

1722 B Aguera y Arcas A Fairhall and W Bialek

Several points are worth noting First except in special cases the eigen-vectors of 1C and the lters f fsup1iquest g are not the principal components of theRCE and hence this analysis of 1C is not a principal component analysisSecond the nonzero eigenvalues of 1C can be either positive or negativedepending on whether the variance of inputs along that particular direc-tion is larger or smaller in the neighborhood of a spike Third although theeigenvectors span the relevant subspace these eigenvectors do not form apreferred coordinate system within this subspace Finally we emphasizethat dimensionality reductionmdashidentication of the relevant subspacemdashisonly the rst step in our analysis of the computation done by a neuron

3 Measuring the Success of Dimensionality Reduction

The claim that certain stimulus features are most relevant is in effect a modelfor the neuron so the next question is how to measure the effectiveness oraccuracy of this model Several different ideas have been suggested in theliterature as ways of testing models based on linear receptive elds in thevisual system (Stanley Lei amp Dan 1999 Keat Reinagel Reid amp Meister2001) or linear spectrotemporal receptive elds in the auditory system (The-unissen Sen amp Doupe 2000) These methods have in common that they in-troduce a metric to measure performancemdashfor example mean square errorin predicting the ring rate as averaged over some window of time Ideallywe would like to have a performance measure that avoids any arbitrari-ness in the choice of metric and such metric-free measures are provideduniquely by information theory (Shannon 1948 Cover amp Thomas 1991)

Observing the arrival time t0 of a single spikeprovidesa certainamount ofinformationabout the input signals Since information ismutual we can alsosay that knowing the input signal trajectory It lt t0 provides informationabout the arrival time of the spike If ldquodetails are irrelevantrdquo then we shouldbe able to discard these details from our description of the stimulus and yetpreserve the mutual information between the stimulus and spike arrivaltimes (for an abstract discussion of such selective compression see Tishbyet al 1999) In constructing our low-dimensional model we represent thecomplete (D-dimensional) stimulus It lt t0 by a smaller number (K lt D)of dimensions Es D s1 s2 sK

The mutual information I[It lt t0I t0] is a property of the neuron itselfwhile the mutual information I[EsI t0] characterizes how much our reduceddescription of the stimulus can tell us about when spikes will occur Nec-essarily our reduction of dimensionality causes a loss of information sothat

I[EsI t0] middot I[It lt t0I t0] (31)

but if our reduced description really captures the computation done by theneuron then the two information measures will be very close In particular

Computation in a Single Neuron 1723

if the neuron were described exactly by a lower-dimensional modelmdashas fora linear perceptron or for an integrate-and-re neuron (Aguera y Arcas ampFairhall 2003)mdashthen the two information measures would be equal Moregenerally the ratio I[EsI t0]=I[It lt t0I t0] quanties the efciency of thelow-dimensional model measuring the fraction of information about spikearrival times that our K dimensions capture from the full signal It lt t0

As shown by Brenner Strong Koberle Bialek and de Ruyter van Steven-inck (2000) the arrival time of a single spike provides an information

I[It lt t0I t0] acute Ione spike D 1T

Z T

0dt

rtNr

log2

micrort

Nr

para (32)

where rt is the time-dependent spike rate Nr is the average spike rateand hcent cent centi denotes an average over time In principle information should becalculated as an average over the distribution of stimuli but the ergodicityof the stimulus justies replacing this ensemble average with a time averageFor a deterministic system like the HH equations the spike rate is a singularfunction of time given the inputs It spikes occur at denite times with norandomness or irreproducibility If we observe these responses with a timeresolution 1t then for 1t sufciently small the rate rt at any time t eitheris zero or corresponds to a single spike occurring in one bin of size 1t thatis r D 1=1t Thus the information carried by a single spike is

Ione spike D iexcl log2 Nr1t (33)

On the other hand if the probability of spiking really depends on only thestimulus dimensions s1 s2 sK we can substitute

rtNr

PEs j spike at tPEs

(34)

Replacing the time averages in equation 32 with ensemble averages wend

I[EsI t0]acute IEsone spike D

ZdKsPEs j spike at t log2

microPEs j spike at t

PEs

para(35)

(for details of these arguments see Brenner Strong et al 2000) This al-lows us to compare the information captured by the K-dimensional reducedmodel with the true information carried by single spikes in the spike train

For reasons that we will discuss in the following section and as waspointed out in Aguera y Arcas et al (2001) and Aguera y Arcas and Fairhall(2003) we will be considering isolated spikesmdashthose separated from pre-vious spikes by a period of silence This has important consequences forour analysis Most signicantly as we will be considering spikes that occur

1724 B Aguera y Arcas A Fairhall and W Bialek

on a background of silence the relevant stimulus ensemble conditionedon the silence is no longer gaussian Further we will need to rene ourinformation estimate

The derivation of equation 32 makes clear that a similar formula mustdetermine the information carried by the occurrence time of any event notjust single spikes we can dene an event rate in place of the spike rate andthen calculate the information carried by these events (Brenner Strong etal 2000) In the case here we wish to compute the information obtainedby observing an isolated spike or equivalently by the event silence+spikeThis is straightforward we replace the spike rate by the rate of isolatedspikes and equation 32 will give us the information carried by the arrivaltime of a single isolated spike The problem is that this information includesboth the information carried by the occurrence of the spike and the infor-mation conveyed in the condition that there were no spikes in the precedingtsilence msec (for an early discussion of the information carried by silencesee de Ruyter van Steveninck amp Bialek 1988) We would like to separatethese contributions since our idea of dimensionality reduction applies onlyto the triggering of a spike not to the temporally extended condition ofnonspiking

To separate the information carried by the isolated spike itself we haveto ask how much information we gain by seeing an isolated spike given thatthe condition for isolation has already been met As discussed by BrennerStrong et al (2000) we can compute this information by thinking about thedistribution of times at which the isolated spike can occur Given that weknow the input stimulus the distribution of times at which a single isolatedspike will be observed is proportional to risot the time-dependent rate orperistimulus time histogram for isolated spikes With propernormalizationwe have

Pisot j inputs D1T

cent1

Nrisorisot (36)

where T is duration of the (long) window in which we can look for thespike and Nriso is the average rate of isolated spikes This distribution has anentropy

Sisot j inputs D iexclZ T

0dt Pisot j inputs log2 Pisot j inputs (37)

D iexcl1T

Z T

0dt

risotNriso

log2

micro1T

cent risotNriso

para(38)

D log2TNriso1t bits (39)

where again we use the fact that for a deterministic system the time-dependent rate must be either zero or the maximum allowed by our time

Computation in a Single Neuron 1725

resolution 1t To compute the information carried by a single spike weneed to compare this entropy with the total entropy possible when we donot know the inputs

It is tempting to think that without knowledge of the inputs an isolatedspike is equally likely to occur anywhere in the window of size T whichleads us back to equation 33 with Nr replaced by Nriso In this case howeverwe are assuming that the condition for isolation has already been met Thuseven without observing the inputs we know that isolated spikes can occuronly in windows of time whose total length is Tsilence D T cent Psilence wherePsilence is the probability that any moment in time is at least tsilence after themost recent spike Thus the total entropy of isolated spike arrival times(given that the condition for silence has been met) is reduced from log2 T to

Sisot j silence D log2T cent Psilence (310)

and the information that the spike carries beyond what we know from thesilence itself is

1Iiso spike D Sisot j silence iexcl Sisot j inputs (311)

D1T

Z T

0dt

risotNriso

log2

microrisot

Nrisocent Psilence

para(312)

D iexcl log2Nriso1t C log2 Psilence bits (313)

This information which is dened independent of any model for the fea-ture selectivity of the neuron provides the benchmark against which ourreduction of dimensionality will be measured To make the comparisonhowever we need the analog of equation 35

Equation 312 provides us with an expression for the information con-veyed by isolated spikes in terms of the probability that these spikes occurat particular times this is analogous to equation 32 for single (nonisolated)spikes If we follow a path analogous to that which leads from equation 32to equation 35 we nd an expression for the information that an isolatedspike provides about the K stimulus dimensions Es

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para

C hlog2 Psilence j Esi (314)

where the prior is now also conditioned on silence PEs j silence is thedistribution of Es given that Es is preceded by a silence of at least tsilence Noticethat this silence-conditioned distribution is not knowable a priori and inparticular it is not gaussian PEs j silence must be sampled from data

The last term in equation 314 is the entropy of a binary variable thatindicates whether particular moments in time are silent given knowledge

1726 B Aguera y Arcas A Fairhall and W Bialek

of the stimulus Again since the HH model is deterministic this conditionalentropy should be zero if we keep a complete description of the stimulusIn fact we are not interested in describing those features of the stimulusthat lead to silence and it is not fair (as we will see) to judge the success ofdimensionality reduction by looking at the prediction of silence which nec-essarily involves multiple dimensions To make a meaningful comparisonthen we will assume that there is a perfect description of the stimulus con-ditions leading to silence and focus on the stimulus features that trigger theisolated spike When we approximate these features by the K-dimensionalspace Es we capture an amount of information

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para (315)

This is the information that we can compare with 1Iiso spike in equation 313to determine the efciency of our dimensionality reduction

4 Characterizing the Hodgkin-Huxley Neuron

For completeness we begin with a brief review of the dynamics of thespace-clamped HHneuron (Hodgkin amp Huxley 1952) Hodgkin and Huxleymodeled the dynamics of the current through a patch of membrane withion-specic conductances

CdVdt

D It iexcl NgKn4V iexcl VK iexcl NgNam3hV iexcl VNa iexcl NglV iexcl Vl (41)

where It is injected current K and Na subscripts denote potassiumndash andsodiumndashrelated variables respectively and l (for ldquoleakagerdquo) terms includeall other ion conductances with slower dynamics C is the membrane ca-pacitance VK and VNa are ion-specic reversal potentials and Vl is denedsuch that the total voltage V is exactly zero when the membrane is at restNgK NgNa and Ngl are empirically determined maximal conductances for thedifferent ion species and the gating variables n m and h (on the interval[0 1]) have their own voltage-dependent dynamics

dn=dt D 001V C 011 iexcl n expiexcl01V iexcl 0125n expV=80

dm=dt D 01V C 251 iexcl m expiexcl01V iexcl 15 iexcl 4m expV=18

dh=dt D 0071 iexcl h exp005V iexcl h expiexcl01V iexcl 4 (42)

We have used the original values for these parameters except for changingthe signs of the voltages to correspond to the modern sign convention C D 1sup1Fcm2 NgK D 36 mScm2 NgNa D 120 mScm2 Ngl D 03 mScm2 VK D iexcl12mV VNa D C115 mV Vl D C10613 mV We have taken our system to be

Computation in a Single Neuron 1727

a frac14 pound 302 sup1m2 patch of membrane We solve these equations numericallyusing fourth-order RungendashKutta integration

The system is driven with a gaussian random noise current It gener-ated by smoothing a gaussian random number stream with an exponentiallter to generate a correlation time iquest It is convenient to choose iquest to be longerthan the time steps of numerical integration since this guarantees that allfunctions are smooth on the scale of single time steps Here we will alwaysuse iquest D 02 msec a value that is both less than the timescale over whichwe discretize the stimulus for analysis and far less than the neuronrsquos ca-pacitative smoothing timescale RC raquo 3 msec It has a standard deviationfrac34 but since the correlation time is short the relevant parameter usually isthe spectral density S D frac34 2iquest we also add a DC offset I0 In the followingwe will consider two parameter regimes I0 D 0 and I0 a nite value whichleads to more periodic ring

The integration step size is xed at 005 msec The key numerical exper-iments were repeated at a step size of 001 msec with identical results Thetime of a spike is dened as the moment of maximum voltage for voltagesexceeding a threshold (see Figure 1) estimated to subsample precision byquadratic interpolation As spikes are both very stereotyped and very largecompared to subspiking uctuations the precise value of this threshold isunimportant we have used C20 mV

41 Qualitative Description of Spiking The rst step in our analysis isto use reverse correlation equation 24 to determine the average stimulusfeature preceding a spike the STA In Figure 1(top) we display the STAin a regime where the spectral density of the input current is 65 pound 10iexcl4

nA2 msec The spike-triggered averages of the gating terms n4 (proportionof open potassium channels) and m3h (proportion of open sodium chan-nels) and the membrane voltage V are plotted in Figure 1 (middle and bot-tom) The error bars mark the standard deviation of the trajectories of thesevariables

As expected the voltage and gating variables follow highly stereotypedtrajectories during the raquo5 msec surrounding a spike First the rapid open-ing of the sodium channels causes a sharp membrane depolarization (orrise in V) the slower potassium channels then open and repolarize themembrane leaving it at a slightly lower potential than rest The potassiumchannels close gradually but meanwhile the membrane remains hyperpo-larized and due to its increased permeability to potassium ions at lowerresistance These effects make it difcult to induce a second spike duringthis raquo15 msec ldquorefractory periodrdquo Away from spikes the resting levels anductuations of the voltage and gating variables are quite small The largervalues evident in Figure 1(middle and bottom) by sect15 msec are due to thesummed contributions of nearby spikes

The spike-triggered average current has a largely transient form so thatspikes are on average preceded by an upward swing in current On the

1728 B Aguera y Arcas A Fairhall and W Bialek

Figure 1 Spike-triggered averages with standard deviations for (top) the inputcurrent I (middle) the fraction of open KC and NaC channels and (bottom) themembrane voltage V for the parameter regime I0 D 0 and S D 650 pound 10iexcl4 nA2

sec

other hand there is no obvious bottleneck in the current trajectories sothat the current variance is almost constant throughout the spike This isqualitatively consistent with the idea of dimensionality reduction if theneuron ignores most of the dimensions along which the current can varythen the variance which is shared almost equally among all dimensions forthis near white noise can change by only a small amount

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 4: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1718 B Aguera y Arcas A Fairhall and W Bialek

The ideas of feature selectivity and dimensionality reduction have a longhistory in neurobiology The idea of receptive elds as formulated by Hart-line Kufer and Barlow for the visual system gave a picture of neuronsas having a template against which images would be correlated (Hartline1940 Kufer 1953 Barlow 1953) If we think of images as vectors in ahigh-dimensional space with coordinates determined by the intensities ofeach pixel then the simplest receptive eld models describe the neuronas sensitive to only one direction or projection in this high-dimensionalspace This picture of projection followed by thresholding or some othernonlinearity to determine the probability of spike generation was formal-ized in the linear perceptron (Rosenblatt 1958 1962) In subsequent workBarlow Hill and Levick (1964) characterized neurons in which the recep-tive eld has subregions in space and time such that summation is at leastapproximately linear in each subregion but these summed signals inter-act nonlinearly for example to generate direction selectivity and motionsensitivity We can think of Hubel and Wieselrsquos description of complexand hypercomplex cells (Hubel amp Wiesel 1962) again as a picture of ap-proximately linear summation within subregions followed by nonlinearoperations on these multiple summed signals More formally the propercombination of linear summation and nonlinear or logical operations mayprovide a useful bridge from receptive eld properties to proper geometricprimitives in visual computation (Iverson amp Zucker 1995) In the same waythat a single receptive eld or perceptron model has one relevant dimen-sion in the space of visual stimuli these more complex cells have as manyrelevant dimensions as there are independent subregions of the receptiveeld Although this number is larger than one it still is much smaller thanthe full dimensionality of the possible spatiotemporal variations in visualinputs

The idea that neurons in the auditory system might be described by alter followed by a nonlinear transformation to determine the probabilityof spike generation was the inspiration for de Boerrsquos development (de Boeramp Kuyper 1968) of triggered or reverse correlation Modern uses of reversecorrelation to characterize the ltering or receptive eld properties of a neu-ron often emphasize that this approach provides a ldquolinear approximationrdquoto the input-output properties of the cell but the original idea was almostthe opposite neurons clearly are nonlinear devices but this is separate fromthe question of whether the probability of generating a spike is determinedby a simple projection of the sensory input onto a single lter or template Infact as explained by Rieke Warland Bialek and de Ruyter van Steveninck(1997) linearity is seldom a good approximation for the neural input-outputrelation but if there is one relevant dimension then (provided that inputsignals are chosen with suitable statistics) the reverse correlation method isguaranteed to nd this one special direction in the space of inputs to whichthe neuron is sensitive While the reverse correlation method is guaranteedto nd the one relevant dimension if it exists the method does not include

Computation in a Single Neuron 1719

any way of testing for other relevant dimensions or more generally formeasuring the dimensionality of the relevant subspace

The idea of characterizing neural responses directly as the reduction ofdimensionality emerged from studies (de Ruyter van Steveninck amp Bialek1988) of a motion-sensitive neuron in the y visual system In particularthis work suggested that it is possible to estimate the dimensionality of therelevant subspace rather than just assuming that it is small (or equal to one)More recent work on the y visual system has exploited the idea of dimen-sionality reduction to probe both the structure and adaptation of the neuralcode (Brenner Bialek amp de Ruyter van Steveninck 2000 Fairhall LewenBialek amp de Ruyter van Steveninck 2001) and the nature of the computationthat extracts the motion signal from the spatiotemporal array of photore-ceptor inputs (Bialek amp de Ruyter van Steveninck 2003) Here we reviewthe ideas of dimensionality reduction from previous work extensions ofthese ideas begin in section 3

In the spirit of neural network models we will simplify away the spatialstructure of neurons and consider time-dependent currents It injected intoa pointndashlike neuron While this misses much of the complexity of real cellswe will nd that even this system is highly nontrivial If the input is aninjected current then the neuron maps the history of this current It lt t0into the presence or absence of a spike at time t0 More generally we mightimagine that the cell (or our description) isnoisy so that there is a probabilityof spiking P[spike at t0 j It lt t0] that depends on the current history Thedependence on the history of the current means that the input signal stillis high dimensional even without spatial dependence Working at timeresolution 1t and assuming that currents in a window of size T are relevantto the decision to spike the input space is of dimension D D T=1t whereD is often of order 100

The idea of dimensionality reduction is that the probability of spike gen-eration is sensitive only to some limited number of dimensions K withinthe D-dimensional space of inputs We begin our analysis by searching forlinear subspaces that is a set of signals s1 s2 sK that can be constructedby ltering the current

ssup1 DZ 1

0dt fsup1tIt0 iexcl t (21)

so that the probability of spiking depends on only this small set of signals

P[spike at t0 j It lt t0] D P[spike at t0]gs1 s2 sK (22)

where the inclusion of the average probability of spiking P[spike at t0]leaves g dimensionless If we think of the current It0 iexcl T lt t lt t0 asa D-dimensional vector with one dimension for each discrete sample atspacing 1t then the ltered signals si are linear projections of this vector In

1720 B Aguera y Arcas A Fairhall and W Bialek

this formulation characterizing the computation done by a neuron involvesthree steps

1 Estimate the number of relevant stimulus dimensions K with the hopethat there will be many fewer than the original dimensionality D

2 Identify a set of lters that project into this relevant subspace

3 Characterize the nonlinear function gEs

The classical perceptronndashlike cell of neural network theory would have onlyone relevant dimension given by the vector of weights and a simple formfor g typically a sigmoid

Rather than trying to lookdirectly at the distribution of spikes given stim-uli we follow de Ruyter van Steveninck and Bialek (1988) and consider thedistribution of signals conditional on the response P[It lt t0 j spike at t0]also called the response conditional ensemble (RCE) these are related byBayesrsquo rule

P[spike at t0 j It lt t0]P[spike at t0]

D P[It lt t0 j spike at t0]P[It lt t0]

(23)

We can now compute various moments of the RCE The rst moment is thespike-triggered average stimulus (STA)

STAiquest DZ

[dI]P[It lt t0 j spike at t0]It0 iexcl iquest (24)

which is the object that one computes in reverse correlation (de Boer ampKuyper 1968 Rieke et al 1997) If we choose the distribution of inputstimuli P[It lt t0] to be gaussian white noise then for a perceptronndashlikeneuron sensitive to only one direction in stimulus space it can be shownthat the STA or rst moment of the RCE is proportional to the vector or lterf iquest that denes this direction (Rieke et al 1997)

Although it is a theorem that the STA is proportional to the relevant l-ter f iquest in principle it is possible that the proportionality constant is zeromost plausibly if the neuronrsquos response has some symmetry such as phaseinvariance in the response of high-frequency auditory neurons It also isworth noting that what is really important in this analysis is the gaussiandistribution of the stimuli not the ldquowhitenessrdquo of the spectrum For non-white but gaussian inputs the STA measures the relevant lter blurred bythe correlation function of the inputs and hence the true lter can be recov-ered (at least in principle) by deconvolution For nongaussian signals andnonlinear neurons there is no corresponding guarantee that the selectivityof the neuron can be separated from correlations in the stimulus (Sharpeeet al in press)

To obtain more than one relevant direction (or to reveal relevant direc-tions when symmetries cause the STA to vanish) we proceed to second

Computation in a Single Neuron 1721

order and compute the covariance matrix of uctuations around the spike-triggered average

Cspikeiquest iquest 0 DZ

[dI]P[It lt t0 j spike at t0]It0 iexcl iquest It0 iexcl iquest 0

iexcl STAiquest STAiquest 0 (25)

In the same way that we compare the spike-triggered average to some con-stant average level of the signal in the whole experiment we compare thecovariance matrix Cspike with the covariance of the signal averaged over thewhole experiment

Cprioriquest iquest 0 DZ

[dI]P[It lt t0]It0 iexcl iquest It0 iexcl iquest 0 (26)

to construct the change in the covariance matrix

1C D Cspike iexcl Cprior (27)

With time resolution 1t in a window of duration T as above all of thesecovariances are DpoundD matrices In the same way that the spike-triggered av-erage has the clearest interpretation when we choose inputs from a gaussiandistribution 1C also has the clearest interpretation in this case Specicallyif inputs are drawn from a gaussian distribution then it can be shown that(Bialek amp de Ruyter van Steveninck 2003)

1 If the neuron is sensitive to a limited set of K-input dimensions as inequation 22 then 1C will have only K nonzero eigenvalues1 In thisway we can measure directly the dimensionality K of the relevantsubspace

2 If the distribution of inputs is both gaussian and white then the eigen-vectors associated with the nonzero eigenvalues span the same spaceas that spanned by the lters f fsup1iquest g

3 For nonwhite (correlated) but still gaussian inputs the eigenvectorsspan the space of the lters f fsup1iquest g blurred by convolution with thecorrelation function of the inputs

Thus the analysis of 1C for neurons responding to gaussian inputs shouldallow us to identify the subspace of inputs of relevance and test specicallythe hypothesis that this subspace is of low dimension

1 As with the STA it is in principle possible that symmetries or accidental featuresof the function gEs would cause some of the K eigenvalues to vanish but this is veryunlikely

1722 B Aguera y Arcas A Fairhall and W Bialek

Several points are worth noting First except in special cases the eigen-vectors of 1C and the lters f fsup1iquest g are not the principal components of theRCE and hence this analysis of 1C is not a principal component analysisSecond the nonzero eigenvalues of 1C can be either positive or negativedepending on whether the variance of inputs along that particular direc-tion is larger or smaller in the neighborhood of a spike Third although theeigenvectors span the relevant subspace these eigenvectors do not form apreferred coordinate system within this subspace Finally we emphasizethat dimensionality reductionmdashidentication of the relevant subspacemdashisonly the rst step in our analysis of the computation done by a neuron

3 Measuring the Success of Dimensionality Reduction

The claim that certain stimulus features are most relevant is in effect a modelfor the neuron so the next question is how to measure the effectiveness oraccuracy of this model Several different ideas have been suggested in theliterature as ways of testing models based on linear receptive elds in thevisual system (Stanley Lei amp Dan 1999 Keat Reinagel Reid amp Meister2001) or linear spectrotemporal receptive elds in the auditory system (The-unissen Sen amp Doupe 2000) These methods have in common that they in-troduce a metric to measure performancemdashfor example mean square errorin predicting the ring rate as averaged over some window of time Ideallywe would like to have a performance measure that avoids any arbitrari-ness in the choice of metric and such metric-free measures are provideduniquely by information theory (Shannon 1948 Cover amp Thomas 1991)

Observing the arrival time t0 of a single spikeprovidesa certainamount ofinformationabout the input signals Since information ismutual we can alsosay that knowing the input signal trajectory It lt t0 provides informationabout the arrival time of the spike If ldquodetails are irrelevantrdquo then we shouldbe able to discard these details from our description of the stimulus and yetpreserve the mutual information between the stimulus and spike arrivaltimes (for an abstract discussion of such selective compression see Tishbyet al 1999) In constructing our low-dimensional model we represent thecomplete (D-dimensional) stimulus It lt t0 by a smaller number (K lt D)of dimensions Es D s1 s2 sK

The mutual information I[It lt t0I t0] is a property of the neuron itselfwhile the mutual information I[EsI t0] characterizes how much our reduceddescription of the stimulus can tell us about when spikes will occur Nec-essarily our reduction of dimensionality causes a loss of information sothat

I[EsI t0] middot I[It lt t0I t0] (31)

but if our reduced description really captures the computation done by theneuron then the two information measures will be very close In particular

Computation in a Single Neuron 1723

if the neuron were described exactly by a lower-dimensional modelmdashas fora linear perceptron or for an integrate-and-re neuron (Aguera y Arcas ampFairhall 2003)mdashthen the two information measures would be equal Moregenerally the ratio I[EsI t0]=I[It lt t0I t0] quanties the efciency of thelow-dimensional model measuring the fraction of information about spikearrival times that our K dimensions capture from the full signal It lt t0

As shown by Brenner Strong Koberle Bialek and de Ruyter van Steven-inck (2000) the arrival time of a single spike provides an information

I[It lt t0I t0] acute Ione spike D 1T

Z T

0dt

rtNr

log2

micrort

Nr

para (32)

where rt is the time-dependent spike rate Nr is the average spike rateand hcent cent centi denotes an average over time In principle information should becalculated as an average over the distribution of stimuli but the ergodicityof the stimulus justies replacing this ensemble average with a time averageFor a deterministic system like the HH equations the spike rate is a singularfunction of time given the inputs It spikes occur at denite times with norandomness or irreproducibility If we observe these responses with a timeresolution 1t then for 1t sufciently small the rate rt at any time t eitheris zero or corresponds to a single spike occurring in one bin of size 1t thatis r D 1=1t Thus the information carried by a single spike is

Ione spike D iexcl log2 Nr1t (33)

On the other hand if the probability of spiking really depends on only thestimulus dimensions s1 s2 sK we can substitute

rtNr

PEs j spike at tPEs

(34)

Replacing the time averages in equation 32 with ensemble averages wend

I[EsI t0]acute IEsone spike D

ZdKsPEs j spike at t log2

microPEs j spike at t

PEs

para(35)

(for details of these arguments see Brenner Strong et al 2000) This al-lows us to compare the information captured by the K-dimensional reducedmodel with the true information carried by single spikes in the spike train

For reasons that we will discuss in the following section and as waspointed out in Aguera y Arcas et al (2001) and Aguera y Arcas and Fairhall(2003) we will be considering isolated spikesmdashthose separated from pre-vious spikes by a period of silence This has important consequences forour analysis Most signicantly as we will be considering spikes that occur

1724 B Aguera y Arcas A Fairhall and W Bialek

on a background of silence the relevant stimulus ensemble conditionedon the silence is no longer gaussian Further we will need to rene ourinformation estimate

The derivation of equation 32 makes clear that a similar formula mustdetermine the information carried by the occurrence time of any event notjust single spikes we can dene an event rate in place of the spike rate andthen calculate the information carried by these events (Brenner Strong etal 2000) In the case here we wish to compute the information obtainedby observing an isolated spike or equivalently by the event silence+spikeThis is straightforward we replace the spike rate by the rate of isolatedspikes and equation 32 will give us the information carried by the arrivaltime of a single isolated spike The problem is that this information includesboth the information carried by the occurrence of the spike and the infor-mation conveyed in the condition that there were no spikes in the precedingtsilence msec (for an early discussion of the information carried by silencesee de Ruyter van Steveninck amp Bialek 1988) We would like to separatethese contributions since our idea of dimensionality reduction applies onlyto the triggering of a spike not to the temporally extended condition ofnonspiking

To separate the information carried by the isolated spike itself we haveto ask how much information we gain by seeing an isolated spike given thatthe condition for isolation has already been met As discussed by BrennerStrong et al (2000) we can compute this information by thinking about thedistribution of times at which the isolated spike can occur Given that weknow the input stimulus the distribution of times at which a single isolatedspike will be observed is proportional to risot the time-dependent rate orperistimulus time histogram for isolated spikes With propernormalizationwe have

Pisot j inputs D1T

cent1

Nrisorisot (36)

where T is duration of the (long) window in which we can look for thespike and Nriso is the average rate of isolated spikes This distribution has anentropy

Sisot j inputs D iexclZ T

0dt Pisot j inputs log2 Pisot j inputs (37)

D iexcl1T

Z T

0dt

risotNriso

log2

micro1T

cent risotNriso

para(38)

D log2TNriso1t bits (39)

where again we use the fact that for a deterministic system the time-dependent rate must be either zero or the maximum allowed by our time

Computation in a Single Neuron 1725

resolution 1t To compute the information carried by a single spike weneed to compare this entropy with the total entropy possible when we donot know the inputs

It is tempting to think that without knowledge of the inputs an isolatedspike is equally likely to occur anywhere in the window of size T whichleads us back to equation 33 with Nr replaced by Nriso In this case howeverwe are assuming that the condition for isolation has already been met Thuseven without observing the inputs we know that isolated spikes can occuronly in windows of time whose total length is Tsilence D T cent Psilence wherePsilence is the probability that any moment in time is at least tsilence after themost recent spike Thus the total entropy of isolated spike arrival times(given that the condition for silence has been met) is reduced from log2 T to

Sisot j silence D log2T cent Psilence (310)

and the information that the spike carries beyond what we know from thesilence itself is

1Iiso spike D Sisot j silence iexcl Sisot j inputs (311)

D1T

Z T

0dt

risotNriso

log2

microrisot

Nrisocent Psilence

para(312)

D iexcl log2Nriso1t C log2 Psilence bits (313)

This information which is dened independent of any model for the fea-ture selectivity of the neuron provides the benchmark against which ourreduction of dimensionality will be measured To make the comparisonhowever we need the analog of equation 35

Equation 312 provides us with an expression for the information con-veyed by isolated spikes in terms of the probability that these spikes occurat particular times this is analogous to equation 32 for single (nonisolated)spikes If we follow a path analogous to that which leads from equation 32to equation 35 we nd an expression for the information that an isolatedspike provides about the K stimulus dimensions Es

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para

C hlog2 Psilence j Esi (314)

where the prior is now also conditioned on silence PEs j silence is thedistribution of Es given that Es is preceded by a silence of at least tsilence Noticethat this silence-conditioned distribution is not knowable a priori and inparticular it is not gaussian PEs j silence must be sampled from data

The last term in equation 314 is the entropy of a binary variable thatindicates whether particular moments in time are silent given knowledge

1726 B Aguera y Arcas A Fairhall and W Bialek

of the stimulus Again since the HH model is deterministic this conditionalentropy should be zero if we keep a complete description of the stimulusIn fact we are not interested in describing those features of the stimulusthat lead to silence and it is not fair (as we will see) to judge the success ofdimensionality reduction by looking at the prediction of silence which nec-essarily involves multiple dimensions To make a meaningful comparisonthen we will assume that there is a perfect description of the stimulus con-ditions leading to silence and focus on the stimulus features that trigger theisolated spike When we approximate these features by the K-dimensionalspace Es we capture an amount of information

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para (315)

This is the information that we can compare with 1Iiso spike in equation 313to determine the efciency of our dimensionality reduction

4 Characterizing the Hodgkin-Huxley Neuron

For completeness we begin with a brief review of the dynamics of thespace-clamped HHneuron (Hodgkin amp Huxley 1952) Hodgkin and Huxleymodeled the dynamics of the current through a patch of membrane withion-specic conductances

CdVdt

D It iexcl NgKn4V iexcl VK iexcl NgNam3hV iexcl VNa iexcl NglV iexcl Vl (41)

where It is injected current K and Na subscripts denote potassiumndash andsodiumndashrelated variables respectively and l (for ldquoleakagerdquo) terms includeall other ion conductances with slower dynamics C is the membrane ca-pacitance VK and VNa are ion-specic reversal potentials and Vl is denedsuch that the total voltage V is exactly zero when the membrane is at restNgK NgNa and Ngl are empirically determined maximal conductances for thedifferent ion species and the gating variables n m and h (on the interval[0 1]) have their own voltage-dependent dynamics

dn=dt D 001V C 011 iexcl n expiexcl01V iexcl 0125n expV=80

dm=dt D 01V C 251 iexcl m expiexcl01V iexcl 15 iexcl 4m expV=18

dh=dt D 0071 iexcl h exp005V iexcl h expiexcl01V iexcl 4 (42)

We have used the original values for these parameters except for changingthe signs of the voltages to correspond to the modern sign convention C D 1sup1Fcm2 NgK D 36 mScm2 NgNa D 120 mScm2 Ngl D 03 mScm2 VK D iexcl12mV VNa D C115 mV Vl D C10613 mV We have taken our system to be

Computation in a Single Neuron 1727

a frac14 pound 302 sup1m2 patch of membrane We solve these equations numericallyusing fourth-order RungendashKutta integration

The system is driven with a gaussian random noise current It gener-ated by smoothing a gaussian random number stream with an exponentiallter to generate a correlation time iquest It is convenient to choose iquest to be longerthan the time steps of numerical integration since this guarantees that allfunctions are smooth on the scale of single time steps Here we will alwaysuse iquest D 02 msec a value that is both less than the timescale over whichwe discretize the stimulus for analysis and far less than the neuronrsquos ca-pacitative smoothing timescale RC raquo 3 msec It has a standard deviationfrac34 but since the correlation time is short the relevant parameter usually isthe spectral density S D frac34 2iquest we also add a DC offset I0 In the followingwe will consider two parameter regimes I0 D 0 and I0 a nite value whichleads to more periodic ring

The integration step size is xed at 005 msec The key numerical exper-iments were repeated at a step size of 001 msec with identical results Thetime of a spike is dened as the moment of maximum voltage for voltagesexceeding a threshold (see Figure 1) estimated to subsample precision byquadratic interpolation As spikes are both very stereotyped and very largecompared to subspiking uctuations the precise value of this threshold isunimportant we have used C20 mV

41 Qualitative Description of Spiking The rst step in our analysis isto use reverse correlation equation 24 to determine the average stimulusfeature preceding a spike the STA In Figure 1(top) we display the STAin a regime where the spectral density of the input current is 65 pound 10iexcl4

nA2 msec The spike-triggered averages of the gating terms n4 (proportionof open potassium channels) and m3h (proportion of open sodium chan-nels) and the membrane voltage V are plotted in Figure 1 (middle and bot-tom) The error bars mark the standard deviation of the trajectories of thesevariables

As expected the voltage and gating variables follow highly stereotypedtrajectories during the raquo5 msec surrounding a spike First the rapid open-ing of the sodium channels causes a sharp membrane depolarization (orrise in V) the slower potassium channels then open and repolarize themembrane leaving it at a slightly lower potential than rest The potassiumchannels close gradually but meanwhile the membrane remains hyperpo-larized and due to its increased permeability to potassium ions at lowerresistance These effects make it difcult to induce a second spike duringthis raquo15 msec ldquorefractory periodrdquo Away from spikes the resting levels anductuations of the voltage and gating variables are quite small The largervalues evident in Figure 1(middle and bottom) by sect15 msec are due to thesummed contributions of nearby spikes

The spike-triggered average current has a largely transient form so thatspikes are on average preceded by an upward swing in current On the

1728 B Aguera y Arcas A Fairhall and W Bialek

Figure 1 Spike-triggered averages with standard deviations for (top) the inputcurrent I (middle) the fraction of open KC and NaC channels and (bottom) themembrane voltage V for the parameter regime I0 D 0 and S D 650 pound 10iexcl4 nA2

sec

other hand there is no obvious bottleneck in the current trajectories sothat the current variance is almost constant throughout the spike This isqualitatively consistent with the idea of dimensionality reduction if theneuron ignores most of the dimensions along which the current can varythen the variance which is shared almost equally among all dimensions forthis near white noise can change by only a small amount

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 5: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1719

any way of testing for other relevant dimensions or more generally formeasuring the dimensionality of the relevant subspace

The idea of characterizing neural responses directly as the reduction ofdimensionality emerged from studies (de Ruyter van Steveninck amp Bialek1988) of a motion-sensitive neuron in the y visual system In particularthis work suggested that it is possible to estimate the dimensionality of therelevant subspace rather than just assuming that it is small (or equal to one)More recent work on the y visual system has exploited the idea of dimen-sionality reduction to probe both the structure and adaptation of the neuralcode (Brenner Bialek amp de Ruyter van Steveninck 2000 Fairhall LewenBialek amp de Ruyter van Steveninck 2001) and the nature of the computationthat extracts the motion signal from the spatiotemporal array of photore-ceptor inputs (Bialek amp de Ruyter van Steveninck 2003) Here we reviewthe ideas of dimensionality reduction from previous work extensions ofthese ideas begin in section 3

In the spirit of neural network models we will simplify away the spatialstructure of neurons and consider time-dependent currents It injected intoa pointndashlike neuron While this misses much of the complexity of real cellswe will nd that even this system is highly nontrivial If the input is aninjected current then the neuron maps the history of this current It lt t0into the presence or absence of a spike at time t0 More generally we mightimagine that the cell (or our description) isnoisy so that there is a probabilityof spiking P[spike at t0 j It lt t0] that depends on the current history Thedependence on the history of the current means that the input signal stillis high dimensional even without spatial dependence Working at timeresolution 1t and assuming that currents in a window of size T are relevantto the decision to spike the input space is of dimension D D T=1t whereD is often of order 100

The idea of dimensionality reduction is that the probability of spike gen-eration is sensitive only to some limited number of dimensions K withinthe D-dimensional space of inputs We begin our analysis by searching forlinear subspaces that is a set of signals s1 s2 sK that can be constructedby ltering the current

ssup1 DZ 1

0dt fsup1tIt0 iexcl t (21)

so that the probability of spiking depends on only this small set of signals

P[spike at t0 j It lt t0] D P[spike at t0]gs1 s2 sK (22)

where the inclusion of the average probability of spiking P[spike at t0]leaves g dimensionless If we think of the current It0 iexcl T lt t lt t0 asa D-dimensional vector with one dimension for each discrete sample atspacing 1t then the ltered signals si are linear projections of this vector In

1720 B Aguera y Arcas A Fairhall and W Bialek

this formulation characterizing the computation done by a neuron involvesthree steps

1 Estimate the number of relevant stimulus dimensions K with the hopethat there will be many fewer than the original dimensionality D

2 Identify a set of lters that project into this relevant subspace

3 Characterize the nonlinear function gEs

The classical perceptronndashlike cell of neural network theory would have onlyone relevant dimension given by the vector of weights and a simple formfor g typically a sigmoid

Rather than trying to lookdirectly at the distribution of spikes given stim-uli we follow de Ruyter van Steveninck and Bialek (1988) and consider thedistribution of signals conditional on the response P[It lt t0 j spike at t0]also called the response conditional ensemble (RCE) these are related byBayesrsquo rule

P[spike at t0 j It lt t0]P[spike at t0]

D P[It lt t0 j spike at t0]P[It lt t0]

(23)

We can now compute various moments of the RCE The rst moment is thespike-triggered average stimulus (STA)

STAiquest DZ

[dI]P[It lt t0 j spike at t0]It0 iexcl iquest (24)

which is the object that one computes in reverse correlation (de Boer ampKuyper 1968 Rieke et al 1997) If we choose the distribution of inputstimuli P[It lt t0] to be gaussian white noise then for a perceptronndashlikeneuron sensitive to only one direction in stimulus space it can be shownthat the STA or rst moment of the RCE is proportional to the vector or lterf iquest that denes this direction (Rieke et al 1997)

Although it is a theorem that the STA is proportional to the relevant l-ter f iquest in principle it is possible that the proportionality constant is zeromost plausibly if the neuronrsquos response has some symmetry such as phaseinvariance in the response of high-frequency auditory neurons It also isworth noting that what is really important in this analysis is the gaussiandistribution of the stimuli not the ldquowhitenessrdquo of the spectrum For non-white but gaussian inputs the STA measures the relevant lter blurred bythe correlation function of the inputs and hence the true lter can be recov-ered (at least in principle) by deconvolution For nongaussian signals andnonlinear neurons there is no corresponding guarantee that the selectivityof the neuron can be separated from correlations in the stimulus (Sharpeeet al in press)

To obtain more than one relevant direction (or to reveal relevant direc-tions when symmetries cause the STA to vanish) we proceed to second

Computation in a Single Neuron 1721

order and compute the covariance matrix of uctuations around the spike-triggered average

Cspikeiquest iquest 0 DZ

[dI]P[It lt t0 j spike at t0]It0 iexcl iquest It0 iexcl iquest 0

iexcl STAiquest STAiquest 0 (25)

In the same way that we compare the spike-triggered average to some con-stant average level of the signal in the whole experiment we compare thecovariance matrix Cspike with the covariance of the signal averaged over thewhole experiment

Cprioriquest iquest 0 DZ

[dI]P[It lt t0]It0 iexcl iquest It0 iexcl iquest 0 (26)

to construct the change in the covariance matrix

1C D Cspike iexcl Cprior (27)

With time resolution 1t in a window of duration T as above all of thesecovariances are DpoundD matrices In the same way that the spike-triggered av-erage has the clearest interpretation when we choose inputs from a gaussiandistribution 1C also has the clearest interpretation in this case Specicallyif inputs are drawn from a gaussian distribution then it can be shown that(Bialek amp de Ruyter van Steveninck 2003)

1 If the neuron is sensitive to a limited set of K-input dimensions as inequation 22 then 1C will have only K nonzero eigenvalues1 In thisway we can measure directly the dimensionality K of the relevantsubspace

2 If the distribution of inputs is both gaussian and white then the eigen-vectors associated with the nonzero eigenvalues span the same spaceas that spanned by the lters f fsup1iquest g

3 For nonwhite (correlated) but still gaussian inputs the eigenvectorsspan the space of the lters f fsup1iquest g blurred by convolution with thecorrelation function of the inputs

Thus the analysis of 1C for neurons responding to gaussian inputs shouldallow us to identify the subspace of inputs of relevance and test specicallythe hypothesis that this subspace is of low dimension

1 As with the STA it is in principle possible that symmetries or accidental featuresof the function gEs would cause some of the K eigenvalues to vanish but this is veryunlikely

1722 B Aguera y Arcas A Fairhall and W Bialek

Several points are worth noting First except in special cases the eigen-vectors of 1C and the lters f fsup1iquest g are not the principal components of theRCE and hence this analysis of 1C is not a principal component analysisSecond the nonzero eigenvalues of 1C can be either positive or negativedepending on whether the variance of inputs along that particular direc-tion is larger or smaller in the neighborhood of a spike Third although theeigenvectors span the relevant subspace these eigenvectors do not form apreferred coordinate system within this subspace Finally we emphasizethat dimensionality reductionmdashidentication of the relevant subspacemdashisonly the rst step in our analysis of the computation done by a neuron

3 Measuring the Success of Dimensionality Reduction

The claim that certain stimulus features are most relevant is in effect a modelfor the neuron so the next question is how to measure the effectiveness oraccuracy of this model Several different ideas have been suggested in theliterature as ways of testing models based on linear receptive elds in thevisual system (Stanley Lei amp Dan 1999 Keat Reinagel Reid amp Meister2001) or linear spectrotemporal receptive elds in the auditory system (The-unissen Sen amp Doupe 2000) These methods have in common that they in-troduce a metric to measure performancemdashfor example mean square errorin predicting the ring rate as averaged over some window of time Ideallywe would like to have a performance measure that avoids any arbitrari-ness in the choice of metric and such metric-free measures are provideduniquely by information theory (Shannon 1948 Cover amp Thomas 1991)

Observing the arrival time t0 of a single spikeprovidesa certainamount ofinformationabout the input signals Since information ismutual we can alsosay that knowing the input signal trajectory It lt t0 provides informationabout the arrival time of the spike If ldquodetails are irrelevantrdquo then we shouldbe able to discard these details from our description of the stimulus and yetpreserve the mutual information between the stimulus and spike arrivaltimes (for an abstract discussion of such selective compression see Tishbyet al 1999) In constructing our low-dimensional model we represent thecomplete (D-dimensional) stimulus It lt t0 by a smaller number (K lt D)of dimensions Es D s1 s2 sK

The mutual information I[It lt t0I t0] is a property of the neuron itselfwhile the mutual information I[EsI t0] characterizes how much our reduceddescription of the stimulus can tell us about when spikes will occur Nec-essarily our reduction of dimensionality causes a loss of information sothat

I[EsI t0] middot I[It lt t0I t0] (31)

but if our reduced description really captures the computation done by theneuron then the two information measures will be very close In particular

Computation in a Single Neuron 1723

if the neuron were described exactly by a lower-dimensional modelmdashas fora linear perceptron or for an integrate-and-re neuron (Aguera y Arcas ampFairhall 2003)mdashthen the two information measures would be equal Moregenerally the ratio I[EsI t0]=I[It lt t0I t0] quanties the efciency of thelow-dimensional model measuring the fraction of information about spikearrival times that our K dimensions capture from the full signal It lt t0

As shown by Brenner Strong Koberle Bialek and de Ruyter van Steven-inck (2000) the arrival time of a single spike provides an information

I[It lt t0I t0] acute Ione spike D 1T

Z T

0dt

rtNr

log2

micrort

Nr

para (32)

where rt is the time-dependent spike rate Nr is the average spike rateand hcent cent centi denotes an average over time In principle information should becalculated as an average over the distribution of stimuli but the ergodicityof the stimulus justies replacing this ensemble average with a time averageFor a deterministic system like the HH equations the spike rate is a singularfunction of time given the inputs It spikes occur at denite times with norandomness or irreproducibility If we observe these responses with a timeresolution 1t then for 1t sufciently small the rate rt at any time t eitheris zero or corresponds to a single spike occurring in one bin of size 1t thatis r D 1=1t Thus the information carried by a single spike is

Ione spike D iexcl log2 Nr1t (33)

On the other hand if the probability of spiking really depends on only thestimulus dimensions s1 s2 sK we can substitute

rtNr

PEs j spike at tPEs

(34)

Replacing the time averages in equation 32 with ensemble averages wend

I[EsI t0]acute IEsone spike D

ZdKsPEs j spike at t log2

microPEs j spike at t

PEs

para(35)

(for details of these arguments see Brenner Strong et al 2000) This al-lows us to compare the information captured by the K-dimensional reducedmodel with the true information carried by single spikes in the spike train

For reasons that we will discuss in the following section and as waspointed out in Aguera y Arcas et al (2001) and Aguera y Arcas and Fairhall(2003) we will be considering isolated spikesmdashthose separated from pre-vious spikes by a period of silence This has important consequences forour analysis Most signicantly as we will be considering spikes that occur

1724 B Aguera y Arcas A Fairhall and W Bialek

on a background of silence the relevant stimulus ensemble conditionedon the silence is no longer gaussian Further we will need to rene ourinformation estimate

The derivation of equation 32 makes clear that a similar formula mustdetermine the information carried by the occurrence time of any event notjust single spikes we can dene an event rate in place of the spike rate andthen calculate the information carried by these events (Brenner Strong etal 2000) In the case here we wish to compute the information obtainedby observing an isolated spike or equivalently by the event silence+spikeThis is straightforward we replace the spike rate by the rate of isolatedspikes and equation 32 will give us the information carried by the arrivaltime of a single isolated spike The problem is that this information includesboth the information carried by the occurrence of the spike and the infor-mation conveyed in the condition that there were no spikes in the precedingtsilence msec (for an early discussion of the information carried by silencesee de Ruyter van Steveninck amp Bialek 1988) We would like to separatethese contributions since our idea of dimensionality reduction applies onlyto the triggering of a spike not to the temporally extended condition ofnonspiking

To separate the information carried by the isolated spike itself we haveto ask how much information we gain by seeing an isolated spike given thatthe condition for isolation has already been met As discussed by BrennerStrong et al (2000) we can compute this information by thinking about thedistribution of times at which the isolated spike can occur Given that weknow the input stimulus the distribution of times at which a single isolatedspike will be observed is proportional to risot the time-dependent rate orperistimulus time histogram for isolated spikes With propernormalizationwe have

Pisot j inputs D1T

cent1

Nrisorisot (36)

where T is duration of the (long) window in which we can look for thespike and Nriso is the average rate of isolated spikes This distribution has anentropy

Sisot j inputs D iexclZ T

0dt Pisot j inputs log2 Pisot j inputs (37)

D iexcl1T

Z T

0dt

risotNriso

log2

micro1T

cent risotNriso

para(38)

D log2TNriso1t bits (39)

where again we use the fact that for a deterministic system the time-dependent rate must be either zero or the maximum allowed by our time

Computation in a Single Neuron 1725

resolution 1t To compute the information carried by a single spike weneed to compare this entropy with the total entropy possible when we donot know the inputs

It is tempting to think that without knowledge of the inputs an isolatedspike is equally likely to occur anywhere in the window of size T whichleads us back to equation 33 with Nr replaced by Nriso In this case howeverwe are assuming that the condition for isolation has already been met Thuseven without observing the inputs we know that isolated spikes can occuronly in windows of time whose total length is Tsilence D T cent Psilence wherePsilence is the probability that any moment in time is at least tsilence after themost recent spike Thus the total entropy of isolated spike arrival times(given that the condition for silence has been met) is reduced from log2 T to

Sisot j silence D log2T cent Psilence (310)

and the information that the spike carries beyond what we know from thesilence itself is

1Iiso spike D Sisot j silence iexcl Sisot j inputs (311)

D1T

Z T

0dt

risotNriso

log2

microrisot

Nrisocent Psilence

para(312)

D iexcl log2Nriso1t C log2 Psilence bits (313)

This information which is dened independent of any model for the fea-ture selectivity of the neuron provides the benchmark against which ourreduction of dimensionality will be measured To make the comparisonhowever we need the analog of equation 35

Equation 312 provides us with an expression for the information con-veyed by isolated spikes in terms of the probability that these spikes occurat particular times this is analogous to equation 32 for single (nonisolated)spikes If we follow a path analogous to that which leads from equation 32to equation 35 we nd an expression for the information that an isolatedspike provides about the K stimulus dimensions Es

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para

C hlog2 Psilence j Esi (314)

where the prior is now also conditioned on silence PEs j silence is thedistribution of Es given that Es is preceded by a silence of at least tsilence Noticethat this silence-conditioned distribution is not knowable a priori and inparticular it is not gaussian PEs j silence must be sampled from data

The last term in equation 314 is the entropy of a binary variable thatindicates whether particular moments in time are silent given knowledge

1726 B Aguera y Arcas A Fairhall and W Bialek

of the stimulus Again since the HH model is deterministic this conditionalentropy should be zero if we keep a complete description of the stimulusIn fact we are not interested in describing those features of the stimulusthat lead to silence and it is not fair (as we will see) to judge the success ofdimensionality reduction by looking at the prediction of silence which nec-essarily involves multiple dimensions To make a meaningful comparisonthen we will assume that there is a perfect description of the stimulus con-ditions leading to silence and focus on the stimulus features that trigger theisolated spike When we approximate these features by the K-dimensionalspace Es we capture an amount of information

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para (315)

This is the information that we can compare with 1Iiso spike in equation 313to determine the efciency of our dimensionality reduction

4 Characterizing the Hodgkin-Huxley Neuron

For completeness we begin with a brief review of the dynamics of thespace-clamped HHneuron (Hodgkin amp Huxley 1952) Hodgkin and Huxleymodeled the dynamics of the current through a patch of membrane withion-specic conductances

CdVdt

D It iexcl NgKn4V iexcl VK iexcl NgNam3hV iexcl VNa iexcl NglV iexcl Vl (41)

where It is injected current K and Na subscripts denote potassiumndash andsodiumndashrelated variables respectively and l (for ldquoleakagerdquo) terms includeall other ion conductances with slower dynamics C is the membrane ca-pacitance VK and VNa are ion-specic reversal potentials and Vl is denedsuch that the total voltage V is exactly zero when the membrane is at restNgK NgNa and Ngl are empirically determined maximal conductances for thedifferent ion species and the gating variables n m and h (on the interval[0 1]) have their own voltage-dependent dynamics

dn=dt D 001V C 011 iexcl n expiexcl01V iexcl 0125n expV=80

dm=dt D 01V C 251 iexcl m expiexcl01V iexcl 15 iexcl 4m expV=18

dh=dt D 0071 iexcl h exp005V iexcl h expiexcl01V iexcl 4 (42)

We have used the original values for these parameters except for changingthe signs of the voltages to correspond to the modern sign convention C D 1sup1Fcm2 NgK D 36 mScm2 NgNa D 120 mScm2 Ngl D 03 mScm2 VK D iexcl12mV VNa D C115 mV Vl D C10613 mV We have taken our system to be

Computation in a Single Neuron 1727

a frac14 pound 302 sup1m2 patch of membrane We solve these equations numericallyusing fourth-order RungendashKutta integration

The system is driven with a gaussian random noise current It gener-ated by smoothing a gaussian random number stream with an exponentiallter to generate a correlation time iquest It is convenient to choose iquest to be longerthan the time steps of numerical integration since this guarantees that allfunctions are smooth on the scale of single time steps Here we will alwaysuse iquest D 02 msec a value that is both less than the timescale over whichwe discretize the stimulus for analysis and far less than the neuronrsquos ca-pacitative smoothing timescale RC raquo 3 msec It has a standard deviationfrac34 but since the correlation time is short the relevant parameter usually isthe spectral density S D frac34 2iquest we also add a DC offset I0 In the followingwe will consider two parameter regimes I0 D 0 and I0 a nite value whichleads to more periodic ring

The integration step size is xed at 005 msec The key numerical exper-iments were repeated at a step size of 001 msec with identical results Thetime of a spike is dened as the moment of maximum voltage for voltagesexceeding a threshold (see Figure 1) estimated to subsample precision byquadratic interpolation As spikes are both very stereotyped and very largecompared to subspiking uctuations the precise value of this threshold isunimportant we have used C20 mV

41 Qualitative Description of Spiking The rst step in our analysis isto use reverse correlation equation 24 to determine the average stimulusfeature preceding a spike the STA In Figure 1(top) we display the STAin a regime where the spectral density of the input current is 65 pound 10iexcl4

nA2 msec The spike-triggered averages of the gating terms n4 (proportionof open potassium channels) and m3h (proportion of open sodium chan-nels) and the membrane voltage V are plotted in Figure 1 (middle and bot-tom) The error bars mark the standard deviation of the trajectories of thesevariables

As expected the voltage and gating variables follow highly stereotypedtrajectories during the raquo5 msec surrounding a spike First the rapid open-ing of the sodium channels causes a sharp membrane depolarization (orrise in V) the slower potassium channels then open and repolarize themembrane leaving it at a slightly lower potential than rest The potassiumchannels close gradually but meanwhile the membrane remains hyperpo-larized and due to its increased permeability to potassium ions at lowerresistance These effects make it difcult to induce a second spike duringthis raquo15 msec ldquorefractory periodrdquo Away from spikes the resting levels anductuations of the voltage and gating variables are quite small The largervalues evident in Figure 1(middle and bottom) by sect15 msec are due to thesummed contributions of nearby spikes

The spike-triggered average current has a largely transient form so thatspikes are on average preceded by an upward swing in current On the

1728 B Aguera y Arcas A Fairhall and W Bialek

Figure 1 Spike-triggered averages with standard deviations for (top) the inputcurrent I (middle) the fraction of open KC and NaC channels and (bottom) themembrane voltage V for the parameter regime I0 D 0 and S D 650 pound 10iexcl4 nA2

sec

other hand there is no obvious bottleneck in the current trajectories sothat the current variance is almost constant throughout the spike This isqualitatively consistent with the idea of dimensionality reduction if theneuron ignores most of the dimensions along which the current can varythen the variance which is shared almost equally among all dimensions forthis near white noise can change by only a small amount

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 6: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1720 B Aguera y Arcas A Fairhall and W Bialek

this formulation characterizing the computation done by a neuron involvesthree steps

1 Estimate the number of relevant stimulus dimensions K with the hopethat there will be many fewer than the original dimensionality D

2 Identify a set of lters that project into this relevant subspace

3 Characterize the nonlinear function gEs

The classical perceptronndashlike cell of neural network theory would have onlyone relevant dimension given by the vector of weights and a simple formfor g typically a sigmoid

Rather than trying to lookdirectly at the distribution of spikes given stim-uli we follow de Ruyter van Steveninck and Bialek (1988) and consider thedistribution of signals conditional on the response P[It lt t0 j spike at t0]also called the response conditional ensemble (RCE) these are related byBayesrsquo rule

P[spike at t0 j It lt t0]P[spike at t0]

D P[It lt t0 j spike at t0]P[It lt t0]

(23)

We can now compute various moments of the RCE The rst moment is thespike-triggered average stimulus (STA)

STAiquest DZ

[dI]P[It lt t0 j spike at t0]It0 iexcl iquest (24)

which is the object that one computes in reverse correlation (de Boer ampKuyper 1968 Rieke et al 1997) If we choose the distribution of inputstimuli P[It lt t0] to be gaussian white noise then for a perceptronndashlikeneuron sensitive to only one direction in stimulus space it can be shownthat the STA or rst moment of the RCE is proportional to the vector or lterf iquest that denes this direction (Rieke et al 1997)

Although it is a theorem that the STA is proportional to the relevant l-ter f iquest in principle it is possible that the proportionality constant is zeromost plausibly if the neuronrsquos response has some symmetry such as phaseinvariance in the response of high-frequency auditory neurons It also isworth noting that what is really important in this analysis is the gaussiandistribution of the stimuli not the ldquowhitenessrdquo of the spectrum For non-white but gaussian inputs the STA measures the relevant lter blurred bythe correlation function of the inputs and hence the true lter can be recov-ered (at least in principle) by deconvolution For nongaussian signals andnonlinear neurons there is no corresponding guarantee that the selectivityof the neuron can be separated from correlations in the stimulus (Sharpeeet al in press)

To obtain more than one relevant direction (or to reveal relevant direc-tions when symmetries cause the STA to vanish) we proceed to second

Computation in a Single Neuron 1721

order and compute the covariance matrix of uctuations around the spike-triggered average

Cspikeiquest iquest 0 DZ

[dI]P[It lt t0 j spike at t0]It0 iexcl iquest It0 iexcl iquest 0

iexcl STAiquest STAiquest 0 (25)

In the same way that we compare the spike-triggered average to some con-stant average level of the signal in the whole experiment we compare thecovariance matrix Cspike with the covariance of the signal averaged over thewhole experiment

Cprioriquest iquest 0 DZ

[dI]P[It lt t0]It0 iexcl iquest It0 iexcl iquest 0 (26)

to construct the change in the covariance matrix

1C D Cspike iexcl Cprior (27)

With time resolution 1t in a window of duration T as above all of thesecovariances are DpoundD matrices In the same way that the spike-triggered av-erage has the clearest interpretation when we choose inputs from a gaussiandistribution 1C also has the clearest interpretation in this case Specicallyif inputs are drawn from a gaussian distribution then it can be shown that(Bialek amp de Ruyter van Steveninck 2003)

1 If the neuron is sensitive to a limited set of K-input dimensions as inequation 22 then 1C will have only K nonzero eigenvalues1 In thisway we can measure directly the dimensionality K of the relevantsubspace

2 If the distribution of inputs is both gaussian and white then the eigen-vectors associated with the nonzero eigenvalues span the same spaceas that spanned by the lters f fsup1iquest g

3 For nonwhite (correlated) but still gaussian inputs the eigenvectorsspan the space of the lters f fsup1iquest g blurred by convolution with thecorrelation function of the inputs

Thus the analysis of 1C for neurons responding to gaussian inputs shouldallow us to identify the subspace of inputs of relevance and test specicallythe hypothesis that this subspace is of low dimension

1 As with the STA it is in principle possible that symmetries or accidental featuresof the function gEs would cause some of the K eigenvalues to vanish but this is veryunlikely

1722 B Aguera y Arcas A Fairhall and W Bialek

Several points are worth noting First except in special cases the eigen-vectors of 1C and the lters f fsup1iquest g are not the principal components of theRCE and hence this analysis of 1C is not a principal component analysisSecond the nonzero eigenvalues of 1C can be either positive or negativedepending on whether the variance of inputs along that particular direc-tion is larger or smaller in the neighborhood of a spike Third although theeigenvectors span the relevant subspace these eigenvectors do not form apreferred coordinate system within this subspace Finally we emphasizethat dimensionality reductionmdashidentication of the relevant subspacemdashisonly the rst step in our analysis of the computation done by a neuron

3 Measuring the Success of Dimensionality Reduction

The claim that certain stimulus features are most relevant is in effect a modelfor the neuron so the next question is how to measure the effectiveness oraccuracy of this model Several different ideas have been suggested in theliterature as ways of testing models based on linear receptive elds in thevisual system (Stanley Lei amp Dan 1999 Keat Reinagel Reid amp Meister2001) or linear spectrotemporal receptive elds in the auditory system (The-unissen Sen amp Doupe 2000) These methods have in common that they in-troduce a metric to measure performancemdashfor example mean square errorin predicting the ring rate as averaged over some window of time Ideallywe would like to have a performance measure that avoids any arbitrari-ness in the choice of metric and such metric-free measures are provideduniquely by information theory (Shannon 1948 Cover amp Thomas 1991)

Observing the arrival time t0 of a single spikeprovidesa certainamount ofinformationabout the input signals Since information ismutual we can alsosay that knowing the input signal trajectory It lt t0 provides informationabout the arrival time of the spike If ldquodetails are irrelevantrdquo then we shouldbe able to discard these details from our description of the stimulus and yetpreserve the mutual information between the stimulus and spike arrivaltimes (for an abstract discussion of such selective compression see Tishbyet al 1999) In constructing our low-dimensional model we represent thecomplete (D-dimensional) stimulus It lt t0 by a smaller number (K lt D)of dimensions Es D s1 s2 sK

The mutual information I[It lt t0I t0] is a property of the neuron itselfwhile the mutual information I[EsI t0] characterizes how much our reduceddescription of the stimulus can tell us about when spikes will occur Nec-essarily our reduction of dimensionality causes a loss of information sothat

I[EsI t0] middot I[It lt t0I t0] (31)

but if our reduced description really captures the computation done by theneuron then the two information measures will be very close In particular

Computation in a Single Neuron 1723

if the neuron were described exactly by a lower-dimensional modelmdashas fora linear perceptron or for an integrate-and-re neuron (Aguera y Arcas ampFairhall 2003)mdashthen the two information measures would be equal Moregenerally the ratio I[EsI t0]=I[It lt t0I t0] quanties the efciency of thelow-dimensional model measuring the fraction of information about spikearrival times that our K dimensions capture from the full signal It lt t0

As shown by Brenner Strong Koberle Bialek and de Ruyter van Steven-inck (2000) the arrival time of a single spike provides an information

I[It lt t0I t0] acute Ione spike D 1T

Z T

0dt

rtNr

log2

micrort

Nr

para (32)

where rt is the time-dependent spike rate Nr is the average spike rateand hcent cent centi denotes an average over time In principle information should becalculated as an average over the distribution of stimuli but the ergodicityof the stimulus justies replacing this ensemble average with a time averageFor a deterministic system like the HH equations the spike rate is a singularfunction of time given the inputs It spikes occur at denite times with norandomness or irreproducibility If we observe these responses with a timeresolution 1t then for 1t sufciently small the rate rt at any time t eitheris zero or corresponds to a single spike occurring in one bin of size 1t thatis r D 1=1t Thus the information carried by a single spike is

Ione spike D iexcl log2 Nr1t (33)

On the other hand if the probability of spiking really depends on only thestimulus dimensions s1 s2 sK we can substitute

rtNr

PEs j spike at tPEs

(34)

Replacing the time averages in equation 32 with ensemble averages wend

I[EsI t0]acute IEsone spike D

ZdKsPEs j spike at t log2

microPEs j spike at t

PEs

para(35)

(for details of these arguments see Brenner Strong et al 2000) This al-lows us to compare the information captured by the K-dimensional reducedmodel with the true information carried by single spikes in the spike train

For reasons that we will discuss in the following section and as waspointed out in Aguera y Arcas et al (2001) and Aguera y Arcas and Fairhall(2003) we will be considering isolated spikesmdashthose separated from pre-vious spikes by a period of silence This has important consequences forour analysis Most signicantly as we will be considering spikes that occur

1724 B Aguera y Arcas A Fairhall and W Bialek

on a background of silence the relevant stimulus ensemble conditionedon the silence is no longer gaussian Further we will need to rene ourinformation estimate

The derivation of equation 32 makes clear that a similar formula mustdetermine the information carried by the occurrence time of any event notjust single spikes we can dene an event rate in place of the spike rate andthen calculate the information carried by these events (Brenner Strong etal 2000) In the case here we wish to compute the information obtainedby observing an isolated spike or equivalently by the event silence+spikeThis is straightforward we replace the spike rate by the rate of isolatedspikes and equation 32 will give us the information carried by the arrivaltime of a single isolated spike The problem is that this information includesboth the information carried by the occurrence of the spike and the infor-mation conveyed in the condition that there were no spikes in the precedingtsilence msec (for an early discussion of the information carried by silencesee de Ruyter van Steveninck amp Bialek 1988) We would like to separatethese contributions since our idea of dimensionality reduction applies onlyto the triggering of a spike not to the temporally extended condition ofnonspiking

To separate the information carried by the isolated spike itself we haveto ask how much information we gain by seeing an isolated spike given thatthe condition for isolation has already been met As discussed by BrennerStrong et al (2000) we can compute this information by thinking about thedistribution of times at which the isolated spike can occur Given that weknow the input stimulus the distribution of times at which a single isolatedspike will be observed is proportional to risot the time-dependent rate orperistimulus time histogram for isolated spikes With propernormalizationwe have

Pisot j inputs D1T

cent1

Nrisorisot (36)

where T is duration of the (long) window in which we can look for thespike and Nriso is the average rate of isolated spikes This distribution has anentropy

Sisot j inputs D iexclZ T

0dt Pisot j inputs log2 Pisot j inputs (37)

D iexcl1T

Z T

0dt

risotNriso

log2

micro1T

cent risotNriso

para(38)

D log2TNriso1t bits (39)

where again we use the fact that for a deterministic system the time-dependent rate must be either zero or the maximum allowed by our time

Computation in a Single Neuron 1725

resolution 1t To compute the information carried by a single spike weneed to compare this entropy with the total entropy possible when we donot know the inputs

It is tempting to think that without knowledge of the inputs an isolatedspike is equally likely to occur anywhere in the window of size T whichleads us back to equation 33 with Nr replaced by Nriso In this case howeverwe are assuming that the condition for isolation has already been met Thuseven without observing the inputs we know that isolated spikes can occuronly in windows of time whose total length is Tsilence D T cent Psilence wherePsilence is the probability that any moment in time is at least tsilence after themost recent spike Thus the total entropy of isolated spike arrival times(given that the condition for silence has been met) is reduced from log2 T to

Sisot j silence D log2T cent Psilence (310)

and the information that the spike carries beyond what we know from thesilence itself is

1Iiso spike D Sisot j silence iexcl Sisot j inputs (311)

D1T

Z T

0dt

risotNriso

log2

microrisot

Nrisocent Psilence

para(312)

D iexcl log2Nriso1t C log2 Psilence bits (313)

This information which is dened independent of any model for the fea-ture selectivity of the neuron provides the benchmark against which ourreduction of dimensionality will be measured To make the comparisonhowever we need the analog of equation 35

Equation 312 provides us with an expression for the information con-veyed by isolated spikes in terms of the probability that these spikes occurat particular times this is analogous to equation 32 for single (nonisolated)spikes If we follow a path analogous to that which leads from equation 32to equation 35 we nd an expression for the information that an isolatedspike provides about the K stimulus dimensions Es

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para

C hlog2 Psilence j Esi (314)

where the prior is now also conditioned on silence PEs j silence is thedistribution of Es given that Es is preceded by a silence of at least tsilence Noticethat this silence-conditioned distribution is not knowable a priori and inparticular it is not gaussian PEs j silence must be sampled from data

The last term in equation 314 is the entropy of a binary variable thatindicates whether particular moments in time are silent given knowledge

1726 B Aguera y Arcas A Fairhall and W Bialek

of the stimulus Again since the HH model is deterministic this conditionalentropy should be zero if we keep a complete description of the stimulusIn fact we are not interested in describing those features of the stimulusthat lead to silence and it is not fair (as we will see) to judge the success ofdimensionality reduction by looking at the prediction of silence which nec-essarily involves multiple dimensions To make a meaningful comparisonthen we will assume that there is a perfect description of the stimulus con-ditions leading to silence and focus on the stimulus features that trigger theisolated spike When we approximate these features by the K-dimensionalspace Es we capture an amount of information

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para (315)

This is the information that we can compare with 1Iiso spike in equation 313to determine the efciency of our dimensionality reduction

4 Characterizing the Hodgkin-Huxley Neuron

For completeness we begin with a brief review of the dynamics of thespace-clamped HHneuron (Hodgkin amp Huxley 1952) Hodgkin and Huxleymodeled the dynamics of the current through a patch of membrane withion-specic conductances

CdVdt

D It iexcl NgKn4V iexcl VK iexcl NgNam3hV iexcl VNa iexcl NglV iexcl Vl (41)

where It is injected current K and Na subscripts denote potassiumndash andsodiumndashrelated variables respectively and l (for ldquoleakagerdquo) terms includeall other ion conductances with slower dynamics C is the membrane ca-pacitance VK and VNa are ion-specic reversal potentials and Vl is denedsuch that the total voltage V is exactly zero when the membrane is at restNgK NgNa and Ngl are empirically determined maximal conductances for thedifferent ion species and the gating variables n m and h (on the interval[0 1]) have their own voltage-dependent dynamics

dn=dt D 001V C 011 iexcl n expiexcl01V iexcl 0125n expV=80

dm=dt D 01V C 251 iexcl m expiexcl01V iexcl 15 iexcl 4m expV=18

dh=dt D 0071 iexcl h exp005V iexcl h expiexcl01V iexcl 4 (42)

We have used the original values for these parameters except for changingthe signs of the voltages to correspond to the modern sign convention C D 1sup1Fcm2 NgK D 36 mScm2 NgNa D 120 mScm2 Ngl D 03 mScm2 VK D iexcl12mV VNa D C115 mV Vl D C10613 mV We have taken our system to be

Computation in a Single Neuron 1727

a frac14 pound 302 sup1m2 patch of membrane We solve these equations numericallyusing fourth-order RungendashKutta integration

The system is driven with a gaussian random noise current It gener-ated by smoothing a gaussian random number stream with an exponentiallter to generate a correlation time iquest It is convenient to choose iquest to be longerthan the time steps of numerical integration since this guarantees that allfunctions are smooth on the scale of single time steps Here we will alwaysuse iquest D 02 msec a value that is both less than the timescale over whichwe discretize the stimulus for analysis and far less than the neuronrsquos ca-pacitative smoothing timescale RC raquo 3 msec It has a standard deviationfrac34 but since the correlation time is short the relevant parameter usually isthe spectral density S D frac34 2iquest we also add a DC offset I0 In the followingwe will consider two parameter regimes I0 D 0 and I0 a nite value whichleads to more periodic ring

The integration step size is xed at 005 msec The key numerical exper-iments were repeated at a step size of 001 msec with identical results Thetime of a spike is dened as the moment of maximum voltage for voltagesexceeding a threshold (see Figure 1) estimated to subsample precision byquadratic interpolation As spikes are both very stereotyped and very largecompared to subspiking uctuations the precise value of this threshold isunimportant we have used C20 mV

41 Qualitative Description of Spiking The rst step in our analysis isto use reverse correlation equation 24 to determine the average stimulusfeature preceding a spike the STA In Figure 1(top) we display the STAin a regime where the spectral density of the input current is 65 pound 10iexcl4

nA2 msec The spike-triggered averages of the gating terms n4 (proportionof open potassium channels) and m3h (proportion of open sodium chan-nels) and the membrane voltage V are plotted in Figure 1 (middle and bot-tom) The error bars mark the standard deviation of the trajectories of thesevariables

As expected the voltage and gating variables follow highly stereotypedtrajectories during the raquo5 msec surrounding a spike First the rapid open-ing of the sodium channels causes a sharp membrane depolarization (orrise in V) the slower potassium channels then open and repolarize themembrane leaving it at a slightly lower potential than rest The potassiumchannels close gradually but meanwhile the membrane remains hyperpo-larized and due to its increased permeability to potassium ions at lowerresistance These effects make it difcult to induce a second spike duringthis raquo15 msec ldquorefractory periodrdquo Away from spikes the resting levels anductuations of the voltage and gating variables are quite small The largervalues evident in Figure 1(middle and bottom) by sect15 msec are due to thesummed contributions of nearby spikes

The spike-triggered average current has a largely transient form so thatspikes are on average preceded by an upward swing in current On the

1728 B Aguera y Arcas A Fairhall and W Bialek

Figure 1 Spike-triggered averages with standard deviations for (top) the inputcurrent I (middle) the fraction of open KC and NaC channels and (bottom) themembrane voltage V for the parameter regime I0 D 0 and S D 650 pound 10iexcl4 nA2

sec

other hand there is no obvious bottleneck in the current trajectories sothat the current variance is almost constant throughout the spike This isqualitatively consistent with the idea of dimensionality reduction if theneuron ignores most of the dimensions along which the current can varythen the variance which is shared almost equally among all dimensions forthis near white noise can change by only a small amount

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 7: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1721

order and compute the covariance matrix of uctuations around the spike-triggered average

Cspikeiquest iquest 0 DZ

[dI]P[It lt t0 j spike at t0]It0 iexcl iquest It0 iexcl iquest 0

iexcl STAiquest STAiquest 0 (25)

In the same way that we compare the spike-triggered average to some con-stant average level of the signal in the whole experiment we compare thecovariance matrix Cspike with the covariance of the signal averaged over thewhole experiment

Cprioriquest iquest 0 DZ

[dI]P[It lt t0]It0 iexcl iquest It0 iexcl iquest 0 (26)

to construct the change in the covariance matrix

1C D Cspike iexcl Cprior (27)

With time resolution 1t in a window of duration T as above all of thesecovariances are DpoundD matrices In the same way that the spike-triggered av-erage has the clearest interpretation when we choose inputs from a gaussiandistribution 1C also has the clearest interpretation in this case Specicallyif inputs are drawn from a gaussian distribution then it can be shown that(Bialek amp de Ruyter van Steveninck 2003)

1 If the neuron is sensitive to a limited set of K-input dimensions as inequation 22 then 1C will have only K nonzero eigenvalues1 In thisway we can measure directly the dimensionality K of the relevantsubspace

2 If the distribution of inputs is both gaussian and white then the eigen-vectors associated with the nonzero eigenvalues span the same spaceas that spanned by the lters f fsup1iquest g

3 For nonwhite (correlated) but still gaussian inputs the eigenvectorsspan the space of the lters f fsup1iquest g blurred by convolution with thecorrelation function of the inputs

Thus the analysis of 1C for neurons responding to gaussian inputs shouldallow us to identify the subspace of inputs of relevance and test specicallythe hypothesis that this subspace is of low dimension

1 As with the STA it is in principle possible that symmetries or accidental featuresof the function gEs would cause some of the K eigenvalues to vanish but this is veryunlikely

1722 B Aguera y Arcas A Fairhall and W Bialek

Several points are worth noting First except in special cases the eigen-vectors of 1C and the lters f fsup1iquest g are not the principal components of theRCE and hence this analysis of 1C is not a principal component analysisSecond the nonzero eigenvalues of 1C can be either positive or negativedepending on whether the variance of inputs along that particular direc-tion is larger or smaller in the neighborhood of a spike Third although theeigenvectors span the relevant subspace these eigenvectors do not form apreferred coordinate system within this subspace Finally we emphasizethat dimensionality reductionmdashidentication of the relevant subspacemdashisonly the rst step in our analysis of the computation done by a neuron

3 Measuring the Success of Dimensionality Reduction

The claim that certain stimulus features are most relevant is in effect a modelfor the neuron so the next question is how to measure the effectiveness oraccuracy of this model Several different ideas have been suggested in theliterature as ways of testing models based on linear receptive elds in thevisual system (Stanley Lei amp Dan 1999 Keat Reinagel Reid amp Meister2001) or linear spectrotemporal receptive elds in the auditory system (The-unissen Sen amp Doupe 2000) These methods have in common that they in-troduce a metric to measure performancemdashfor example mean square errorin predicting the ring rate as averaged over some window of time Ideallywe would like to have a performance measure that avoids any arbitrari-ness in the choice of metric and such metric-free measures are provideduniquely by information theory (Shannon 1948 Cover amp Thomas 1991)

Observing the arrival time t0 of a single spikeprovidesa certainamount ofinformationabout the input signals Since information ismutual we can alsosay that knowing the input signal trajectory It lt t0 provides informationabout the arrival time of the spike If ldquodetails are irrelevantrdquo then we shouldbe able to discard these details from our description of the stimulus and yetpreserve the mutual information between the stimulus and spike arrivaltimes (for an abstract discussion of such selective compression see Tishbyet al 1999) In constructing our low-dimensional model we represent thecomplete (D-dimensional) stimulus It lt t0 by a smaller number (K lt D)of dimensions Es D s1 s2 sK

The mutual information I[It lt t0I t0] is a property of the neuron itselfwhile the mutual information I[EsI t0] characterizes how much our reduceddescription of the stimulus can tell us about when spikes will occur Nec-essarily our reduction of dimensionality causes a loss of information sothat

I[EsI t0] middot I[It lt t0I t0] (31)

but if our reduced description really captures the computation done by theneuron then the two information measures will be very close In particular

Computation in a Single Neuron 1723

if the neuron were described exactly by a lower-dimensional modelmdashas fora linear perceptron or for an integrate-and-re neuron (Aguera y Arcas ampFairhall 2003)mdashthen the two information measures would be equal Moregenerally the ratio I[EsI t0]=I[It lt t0I t0] quanties the efciency of thelow-dimensional model measuring the fraction of information about spikearrival times that our K dimensions capture from the full signal It lt t0

As shown by Brenner Strong Koberle Bialek and de Ruyter van Steven-inck (2000) the arrival time of a single spike provides an information

I[It lt t0I t0] acute Ione spike D 1T

Z T

0dt

rtNr

log2

micrort

Nr

para (32)

where rt is the time-dependent spike rate Nr is the average spike rateand hcent cent centi denotes an average over time In principle information should becalculated as an average over the distribution of stimuli but the ergodicityof the stimulus justies replacing this ensemble average with a time averageFor a deterministic system like the HH equations the spike rate is a singularfunction of time given the inputs It spikes occur at denite times with norandomness or irreproducibility If we observe these responses with a timeresolution 1t then for 1t sufciently small the rate rt at any time t eitheris zero or corresponds to a single spike occurring in one bin of size 1t thatis r D 1=1t Thus the information carried by a single spike is

Ione spike D iexcl log2 Nr1t (33)

On the other hand if the probability of spiking really depends on only thestimulus dimensions s1 s2 sK we can substitute

rtNr

PEs j spike at tPEs

(34)

Replacing the time averages in equation 32 with ensemble averages wend

I[EsI t0]acute IEsone spike D

ZdKsPEs j spike at t log2

microPEs j spike at t

PEs

para(35)

(for details of these arguments see Brenner Strong et al 2000) This al-lows us to compare the information captured by the K-dimensional reducedmodel with the true information carried by single spikes in the spike train

For reasons that we will discuss in the following section and as waspointed out in Aguera y Arcas et al (2001) and Aguera y Arcas and Fairhall(2003) we will be considering isolated spikesmdashthose separated from pre-vious spikes by a period of silence This has important consequences forour analysis Most signicantly as we will be considering spikes that occur

1724 B Aguera y Arcas A Fairhall and W Bialek

on a background of silence the relevant stimulus ensemble conditionedon the silence is no longer gaussian Further we will need to rene ourinformation estimate

The derivation of equation 32 makes clear that a similar formula mustdetermine the information carried by the occurrence time of any event notjust single spikes we can dene an event rate in place of the spike rate andthen calculate the information carried by these events (Brenner Strong etal 2000) In the case here we wish to compute the information obtainedby observing an isolated spike or equivalently by the event silence+spikeThis is straightforward we replace the spike rate by the rate of isolatedspikes and equation 32 will give us the information carried by the arrivaltime of a single isolated spike The problem is that this information includesboth the information carried by the occurrence of the spike and the infor-mation conveyed in the condition that there were no spikes in the precedingtsilence msec (for an early discussion of the information carried by silencesee de Ruyter van Steveninck amp Bialek 1988) We would like to separatethese contributions since our idea of dimensionality reduction applies onlyto the triggering of a spike not to the temporally extended condition ofnonspiking

To separate the information carried by the isolated spike itself we haveto ask how much information we gain by seeing an isolated spike given thatthe condition for isolation has already been met As discussed by BrennerStrong et al (2000) we can compute this information by thinking about thedistribution of times at which the isolated spike can occur Given that weknow the input stimulus the distribution of times at which a single isolatedspike will be observed is proportional to risot the time-dependent rate orperistimulus time histogram for isolated spikes With propernormalizationwe have

Pisot j inputs D1T

cent1

Nrisorisot (36)

where T is duration of the (long) window in which we can look for thespike and Nriso is the average rate of isolated spikes This distribution has anentropy

Sisot j inputs D iexclZ T

0dt Pisot j inputs log2 Pisot j inputs (37)

D iexcl1T

Z T

0dt

risotNriso

log2

micro1T

cent risotNriso

para(38)

D log2TNriso1t bits (39)

where again we use the fact that for a deterministic system the time-dependent rate must be either zero or the maximum allowed by our time

Computation in a Single Neuron 1725

resolution 1t To compute the information carried by a single spike weneed to compare this entropy with the total entropy possible when we donot know the inputs

It is tempting to think that without knowledge of the inputs an isolatedspike is equally likely to occur anywhere in the window of size T whichleads us back to equation 33 with Nr replaced by Nriso In this case howeverwe are assuming that the condition for isolation has already been met Thuseven without observing the inputs we know that isolated spikes can occuronly in windows of time whose total length is Tsilence D T cent Psilence wherePsilence is the probability that any moment in time is at least tsilence after themost recent spike Thus the total entropy of isolated spike arrival times(given that the condition for silence has been met) is reduced from log2 T to

Sisot j silence D log2T cent Psilence (310)

and the information that the spike carries beyond what we know from thesilence itself is

1Iiso spike D Sisot j silence iexcl Sisot j inputs (311)

D1T

Z T

0dt

risotNriso

log2

microrisot

Nrisocent Psilence

para(312)

D iexcl log2Nriso1t C log2 Psilence bits (313)

This information which is dened independent of any model for the fea-ture selectivity of the neuron provides the benchmark against which ourreduction of dimensionality will be measured To make the comparisonhowever we need the analog of equation 35

Equation 312 provides us with an expression for the information con-veyed by isolated spikes in terms of the probability that these spikes occurat particular times this is analogous to equation 32 for single (nonisolated)spikes If we follow a path analogous to that which leads from equation 32to equation 35 we nd an expression for the information that an isolatedspike provides about the K stimulus dimensions Es

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para

C hlog2 Psilence j Esi (314)

where the prior is now also conditioned on silence PEs j silence is thedistribution of Es given that Es is preceded by a silence of at least tsilence Noticethat this silence-conditioned distribution is not knowable a priori and inparticular it is not gaussian PEs j silence must be sampled from data

The last term in equation 314 is the entropy of a binary variable thatindicates whether particular moments in time are silent given knowledge

1726 B Aguera y Arcas A Fairhall and W Bialek

of the stimulus Again since the HH model is deterministic this conditionalentropy should be zero if we keep a complete description of the stimulusIn fact we are not interested in describing those features of the stimulusthat lead to silence and it is not fair (as we will see) to judge the success ofdimensionality reduction by looking at the prediction of silence which nec-essarily involves multiple dimensions To make a meaningful comparisonthen we will assume that there is a perfect description of the stimulus con-ditions leading to silence and focus on the stimulus features that trigger theisolated spike When we approximate these features by the K-dimensionalspace Es we capture an amount of information

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para (315)

This is the information that we can compare with 1Iiso spike in equation 313to determine the efciency of our dimensionality reduction

4 Characterizing the Hodgkin-Huxley Neuron

For completeness we begin with a brief review of the dynamics of thespace-clamped HHneuron (Hodgkin amp Huxley 1952) Hodgkin and Huxleymodeled the dynamics of the current through a patch of membrane withion-specic conductances

CdVdt

D It iexcl NgKn4V iexcl VK iexcl NgNam3hV iexcl VNa iexcl NglV iexcl Vl (41)

where It is injected current K and Na subscripts denote potassiumndash andsodiumndashrelated variables respectively and l (for ldquoleakagerdquo) terms includeall other ion conductances with slower dynamics C is the membrane ca-pacitance VK and VNa are ion-specic reversal potentials and Vl is denedsuch that the total voltage V is exactly zero when the membrane is at restNgK NgNa and Ngl are empirically determined maximal conductances for thedifferent ion species and the gating variables n m and h (on the interval[0 1]) have their own voltage-dependent dynamics

dn=dt D 001V C 011 iexcl n expiexcl01V iexcl 0125n expV=80

dm=dt D 01V C 251 iexcl m expiexcl01V iexcl 15 iexcl 4m expV=18

dh=dt D 0071 iexcl h exp005V iexcl h expiexcl01V iexcl 4 (42)

We have used the original values for these parameters except for changingthe signs of the voltages to correspond to the modern sign convention C D 1sup1Fcm2 NgK D 36 mScm2 NgNa D 120 mScm2 Ngl D 03 mScm2 VK D iexcl12mV VNa D C115 mV Vl D C10613 mV We have taken our system to be

Computation in a Single Neuron 1727

a frac14 pound 302 sup1m2 patch of membrane We solve these equations numericallyusing fourth-order RungendashKutta integration

The system is driven with a gaussian random noise current It gener-ated by smoothing a gaussian random number stream with an exponentiallter to generate a correlation time iquest It is convenient to choose iquest to be longerthan the time steps of numerical integration since this guarantees that allfunctions are smooth on the scale of single time steps Here we will alwaysuse iquest D 02 msec a value that is both less than the timescale over whichwe discretize the stimulus for analysis and far less than the neuronrsquos ca-pacitative smoothing timescale RC raquo 3 msec It has a standard deviationfrac34 but since the correlation time is short the relevant parameter usually isthe spectral density S D frac34 2iquest we also add a DC offset I0 In the followingwe will consider two parameter regimes I0 D 0 and I0 a nite value whichleads to more periodic ring

The integration step size is xed at 005 msec The key numerical exper-iments were repeated at a step size of 001 msec with identical results Thetime of a spike is dened as the moment of maximum voltage for voltagesexceeding a threshold (see Figure 1) estimated to subsample precision byquadratic interpolation As spikes are both very stereotyped and very largecompared to subspiking uctuations the precise value of this threshold isunimportant we have used C20 mV

41 Qualitative Description of Spiking The rst step in our analysis isto use reverse correlation equation 24 to determine the average stimulusfeature preceding a spike the STA In Figure 1(top) we display the STAin a regime where the spectral density of the input current is 65 pound 10iexcl4

nA2 msec The spike-triggered averages of the gating terms n4 (proportionof open potassium channels) and m3h (proportion of open sodium chan-nels) and the membrane voltage V are plotted in Figure 1 (middle and bot-tom) The error bars mark the standard deviation of the trajectories of thesevariables

As expected the voltage and gating variables follow highly stereotypedtrajectories during the raquo5 msec surrounding a spike First the rapid open-ing of the sodium channels causes a sharp membrane depolarization (orrise in V) the slower potassium channels then open and repolarize themembrane leaving it at a slightly lower potential than rest The potassiumchannels close gradually but meanwhile the membrane remains hyperpo-larized and due to its increased permeability to potassium ions at lowerresistance These effects make it difcult to induce a second spike duringthis raquo15 msec ldquorefractory periodrdquo Away from spikes the resting levels anductuations of the voltage and gating variables are quite small The largervalues evident in Figure 1(middle and bottom) by sect15 msec are due to thesummed contributions of nearby spikes

The spike-triggered average current has a largely transient form so thatspikes are on average preceded by an upward swing in current On the

1728 B Aguera y Arcas A Fairhall and W Bialek

Figure 1 Spike-triggered averages with standard deviations for (top) the inputcurrent I (middle) the fraction of open KC and NaC channels and (bottom) themembrane voltage V for the parameter regime I0 D 0 and S D 650 pound 10iexcl4 nA2

sec

other hand there is no obvious bottleneck in the current trajectories sothat the current variance is almost constant throughout the spike This isqualitatively consistent with the idea of dimensionality reduction if theneuron ignores most of the dimensions along which the current can varythen the variance which is shared almost equally among all dimensions forthis near white noise can change by only a small amount

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 8: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1722 B Aguera y Arcas A Fairhall and W Bialek

Several points are worth noting First except in special cases the eigen-vectors of 1C and the lters f fsup1iquest g are not the principal components of theRCE and hence this analysis of 1C is not a principal component analysisSecond the nonzero eigenvalues of 1C can be either positive or negativedepending on whether the variance of inputs along that particular direc-tion is larger or smaller in the neighborhood of a spike Third although theeigenvectors span the relevant subspace these eigenvectors do not form apreferred coordinate system within this subspace Finally we emphasizethat dimensionality reductionmdashidentication of the relevant subspacemdashisonly the rst step in our analysis of the computation done by a neuron

3 Measuring the Success of Dimensionality Reduction

The claim that certain stimulus features are most relevant is in effect a modelfor the neuron so the next question is how to measure the effectiveness oraccuracy of this model Several different ideas have been suggested in theliterature as ways of testing models based on linear receptive elds in thevisual system (Stanley Lei amp Dan 1999 Keat Reinagel Reid amp Meister2001) or linear spectrotemporal receptive elds in the auditory system (The-unissen Sen amp Doupe 2000) These methods have in common that they in-troduce a metric to measure performancemdashfor example mean square errorin predicting the ring rate as averaged over some window of time Ideallywe would like to have a performance measure that avoids any arbitrari-ness in the choice of metric and such metric-free measures are provideduniquely by information theory (Shannon 1948 Cover amp Thomas 1991)

Observing the arrival time t0 of a single spikeprovidesa certainamount ofinformationabout the input signals Since information ismutual we can alsosay that knowing the input signal trajectory It lt t0 provides informationabout the arrival time of the spike If ldquodetails are irrelevantrdquo then we shouldbe able to discard these details from our description of the stimulus and yetpreserve the mutual information between the stimulus and spike arrivaltimes (for an abstract discussion of such selective compression see Tishbyet al 1999) In constructing our low-dimensional model we represent thecomplete (D-dimensional) stimulus It lt t0 by a smaller number (K lt D)of dimensions Es D s1 s2 sK

The mutual information I[It lt t0I t0] is a property of the neuron itselfwhile the mutual information I[EsI t0] characterizes how much our reduceddescription of the stimulus can tell us about when spikes will occur Nec-essarily our reduction of dimensionality causes a loss of information sothat

I[EsI t0] middot I[It lt t0I t0] (31)

but if our reduced description really captures the computation done by theneuron then the two information measures will be very close In particular

Computation in a Single Neuron 1723

if the neuron were described exactly by a lower-dimensional modelmdashas fora linear perceptron or for an integrate-and-re neuron (Aguera y Arcas ampFairhall 2003)mdashthen the two information measures would be equal Moregenerally the ratio I[EsI t0]=I[It lt t0I t0] quanties the efciency of thelow-dimensional model measuring the fraction of information about spikearrival times that our K dimensions capture from the full signal It lt t0

As shown by Brenner Strong Koberle Bialek and de Ruyter van Steven-inck (2000) the arrival time of a single spike provides an information

I[It lt t0I t0] acute Ione spike D 1T

Z T

0dt

rtNr

log2

micrort

Nr

para (32)

where rt is the time-dependent spike rate Nr is the average spike rateand hcent cent centi denotes an average over time In principle information should becalculated as an average over the distribution of stimuli but the ergodicityof the stimulus justies replacing this ensemble average with a time averageFor a deterministic system like the HH equations the spike rate is a singularfunction of time given the inputs It spikes occur at denite times with norandomness or irreproducibility If we observe these responses with a timeresolution 1t then for 1t sufciently small the rate rt at any time t eitheris zero or corresponds to a single spike occurring in one bin of size 1t thatis r D 1=1t Thus the information carried by a single spike is

Ione spike D iexcl log2 Nr1t (33)

On the other hand if the probability of spiking really depends on only thestimulus dimensions s1 s2 sK we can substitute

rtNr

PEs j spike at tPEs

(34)

Replacing the time averages in equation 32 with ensemble averages wend

I[EsI t0]acute IEsone spike D

ZdKsPEs j spike at t log2

microPEs j spike at t

PEs

para(35)

(for details of these arguments see Brenner Strong et al 2000) This al-lows us to compare the information captured by the K-dimensional reducedmodel with the true information carried by single spikes in the spike train

For reasons that we will discuss in the following section and as waspointed out in Aguera y Arcas et al (2001) and Aguera y Arcas and Fairhall(2003) we will be considering isolated spikesmdashthose separated from pre-vious spikes by a period of silence This has important consequences forour analysis Most signicantly as we will be considering spikes that occur

1724 B Aguera y Arcas A Fairhall and W Bialek

on a background of silence the relevant stimulus ensemble conditionedon the silence is no longer gaussian Further we will need to rene ourinformation estimate

The derivation of equation 32 makes clear that a similar formula mustdetermine the information carried by the occurrence time of any event notjust single spikes we can dene an event rate in place of the spike rate andthen calculate the information carried by these events (Brenner Strong etal 2000) In the case here we wish to compute the information obtainedby observing an isolated spike or equivalently by the event silence+spikeThis is straightforward we replace the spike rate by the rate of isolatedspikes and equation 32 will give us the information carried by the arrivaltime of a single isolated spike The problem is that this information includesboth the information carried by the occurrence of the spike and the infor-mation conveyed in the condition that there were no spikes in the precedingtsilence msec (for an early discussion of the information carried by silencesee de Ruyter van Steveninck amp Bialek 1988) We would like to separatethese contributions since our idea of dimensionality reduction applies onlyto the triggering of a spike not to the temporally extended condition ofnonspiking

To separate the information carried by the isolated spike itself we haveto ask how much information we gain by seeing an isolated spike given thatthe condition for isolation has already been met As discussed by BrennerStrong et al (2000) we can compute this information by thinking about thedistribution of times at which the isolated spike can occur Given that weknow the input stimulus the distribution of times at which a single isolatedspike will be observed is proportional to risot the time-dependent rate orperistimulus time histogram for isolated spikes With propernormalizationwe have

Pisot j inputs D1T

cent1

Nrisorisot (36)

where T is duration of the (long) window in which we can look for thespike and Nriso is the average rate of isolated spikes This distribution has anentropy

Sisot j inputs D iexclZ T

0dt Pisot j inputs log2 Pisot j inputs (37)

D iexcl1T

Z T

0dt

risotNriso

log2

micro1T

cent risotNriso

para(38)

D log2TNriso1t bits (39)

where again we use the fact that for a deterministic system the time-dependent rate must be either zero or the maximum allowed by our time

Computation in a Single Neuron 1725

resolution 1t To compute the information carried by a single spike weneed to compare this entropy with the total entropy possible when we donot know the inputs

It is tempting to think that without knowledge of the inputs an isolatedspike is equally likely to occur anywhere in the window of size T whichleads us back to equation 33 with Nr replaced by Nriso In this case howeverwe are assuming that the condition for isolation has already been met Thuseven without observing the inputs we know that isolated spikes can occuronly in windows of time whose total length is Tsilence D T cent Psilence wherePsilence is the probability that any moment in time is at least tsilence after themost recent spike Thus the total entropy of isolated spike arrival times(given that the condition for silence has been met) is reduced from log2 T to

Sisot j silence D log2T cent Psilence (310)

and the information that the spike carries beyond what we know from thesilence itself is

1Iiso spike D Sisot j silence iexcl Sisot j inputs (311)

D1T

Z T

0dt

risotNriso

log2

microrisot

Nrisocent Psilence

para(312)

D iexcl log2Nriso1t C log2 Psilence bits (313)

This information which is dened independent of any model for the fea-ture selectivity of the neuron provides the benchmark against which ourreduction of dimensionality will be measured To make the comparisonhowever we need the analog of equation 35

Equation 312 provides us with an expression for the information con-veyed by isolated spikes in terms of the probability that these spikes occurat particular times this is analogous to equation 32 for single (nonisolated)spikes If we follow a path analogous to that which leads from equation 32to equation 35 we nd an expression for the information that an isolatedspike provides about the K stimulus dimensions Es

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para

C hlog2 Psilence j Esi (314)

where the prior is now also conditioned on silence PEs j silence is thedistribution of Es given that Es is preceded by a silence of at least tsilence Noticethat this silence-conditioned distribution is not knowable a priori and inparticular it is not gaussian PEs j silence must be sampled from data

The last term in equation 314 is the entropy of a binary variable thatindicates whether particular moments in time are silent given knowledge

1726 B Aguera y Arcas A Fairhall and W Bialek

of the stimulus Again since the HH model is deterministic this conditionalentropy should be zero if we keep a complete description of the stimulusIn fact we are not interested in describing those features of the stimulusthat lead to silence and it is not fair (as we will see) to judge the success ofdimensionality reduction by looking at the prediction of silence which nec-essarily involves multiple dimensions To make a meaningful comparisonthen we will assume that there is a perfect description of the stimulus con-ditions leading to silence and focus on the stimulus features that trigger theisolated spike When we approximate these features by the K-dimensionalspace Es we capture an amount of information

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para (315)

This is the information that we can compare with 1Iiso spike in equation 313to determine the efciency of our dimensionality reduction

4 Characterizing the Hodgkin-Huxley Neuron

For completeness we begin with a brief review of the dynamics of thespace-clamped HHneuron (Hodgkin amp Huxley 1952) Hodgkin and Huxleymodeled the dynamics of the current through a patch of membrane withion-specic conductances

CdVdt

D It iexcl NgKn4V iexcl VK iexcl NgNam3hV iexcl VNa iexcl NglV iexcl Vl (41)

where It is injected current K and Na subscripts denote potassiumndash andsodiumndashrelated variables respectively and l (for ldquoleakagerdquo) terms includeall other ion conductances with slower dynamics C is the membrane ca-pacitance VK and VNa are ion-specic reversal potentials and Vl is denedsuch that the total voltage V is exactly zero when the membrane is at restNgK NgNa and Ngl are empirically determined maximal conductances for thedifferent ion species and the gating variables n m and h (on the interval[0 1]) have their own voltage-dependent dynamics

dn=dt D 001V C 011 iexcl n expiexcl01V iexcl 0125n expV=80

dm=dt D 01V C 251 iexcl m expiexcl01V iexcl 15 iexcl 4m expV=18

dh=dt D 0071 iexcl h exp005V iexcl h expiexcl01V iexcl 4 (42)

We have used the original values for these parameters except for changingthe signs of the voltages to correspond to the modern sign convention C D 1sup1Fcm2 NgK D 36 mScm2 NgNa D 120 mScm2 Ngl D 03 mScm2 VK D iexcl12mV VNa D C115 mV Vl D C10613 mV We have taken our system to be

Computation in a Single Neuron 1727

a frac14 pound 302 sup1m2 patch of membrane We solve these equations numericallyusing fourth-order RungendashKutta integration

The system is driven with a gaussian random noise current It gener-ated by smoothing a gaussian random number stream with an exponentiallter to generate a correlation time iquest It is convenient to choose iquest to be longerthan the time steps of numerical integration since this guarantees that allfunctions are smooth on the scale of single time steps Here we will alwaysuse iquest D 02 msec a value that is both less than the timescale over whichwe discretize the stimulus for analysis and far less than the neuronrsquos ca-pacitative smoothing timescale RC raquo 3 msec It has a standard deviationfrac34 but since the correlation time is short the relevant parameter usually isthe spectral density S D frac34 2iquest we also add a DC offset I0 In the followingwe will consider two parameter regimes I0 D 0 and I0 a nite value whichleads to more periodic ring

The integration step size is xed at 005 msec The key numerical exper-iments were repeated at a step size of 001 msec with identical results Thetime of a spike is dened as the moment of maximum voltage for voltagesexceeding a threshold (see Figure 1) estimated to subsample precision byquadratic interpolation As spikes are both very stereotyped and very largecompared to subspiking uctuations the precise value of this threshold isunimportant we have used C20 mV

41 Qualitative Description of Spiking The rst step in our analysis isto use reverse correlation equation 24 to determine the average stimulusfeature preceding a spike the STA In Figure 1(top) we display the STAin a regime where the spectral density of the input current is 65 pound 10iexcl4

nA2 msec The spike-triggered averages of the gating terms n4 (proportionof open potassium channels) and m3h (proportion of open sodium chan-nels) and the membrane voltage V are plotted in Figure 1 (middle and bot-tom) The error bars mark the standard deviation of the trajectories of thesevariables

As expected the voltage and gating variables follow highly stereotypedtrajectories during the raquo5 msec surrounding a spike First the rapid open-ing of the sodium channels causes a sharp membrane depolarization (orrise in V) the slower potassium channels then open and repolarize themembrane leaving it at a slightly lower potential than rest The potassiumchannels close gradually but meanwhile the membrane remains hyperpo-larized and due to its increased permeability to potassium ions at lowerresistance These effects make it difcult to induce a second spike duringthis raquo15 msec ldquorefractory periodrdquo Away from spikes the resting levels anductuations of the voltage and gating variables are quite small The largervalues evident in Figure 1(middle and bottom) by sect15 msec are due to thesummed contributions of nearby spikes

The spike-triggered average current has a largely transient form so thatspikes are on average preceded by an upward swing in current On the

1728 B Aguera y Arcas A Fairhall and W Bialek

Figure 1 Spike-triggered averages with standard deviations for (top) the inputcurrent I (middle) the fraction of open KC and NaC channels and (bottom) themembrane voltage V for the parameter regime I0 D 0 and S D 650 pound 10iexcl4 nA2

sec

other hand there is no obvious bottleneck in the current trajectories sothat the current variance is almost constant throughout the spike This isqualitatively consistent with the idea of dimensionality reduction if theneuron ignores most of the dimensions along which the current can varythen the variance which is shared almost equally among all dimensions forthis near white noise can change by only a small amount

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 9: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1723

if the neuron were described exactly by a lower-dimensional modelmdashas fora linear perceptron or for an integrate-and-re neuron (Aguera y Arcas ampFairhall 2003)mdashthen the two information measures would be equal Moregenerally the ratio I[EsI t0]=I[It lt t0I t0] quanties the efciency of thelow-dimensional model measuring the fraction of information about spikearrival times that our K dimensions capture from the full signal It lt t0

As shown by Brenner Strong Koberle Bialek and de Ruyter van Steven-inck (2000) the arrival time of a single spike provides an information

I[It lt t0I t0] acute Ione spike D 1T

Z T

0dt

rtNr

log2

micrort

Nr

para (32)

where rt is the time-dependent spike rate Nr is the average spike rateand hcent cent centi denotes an average over time In principle information should becalculated as an average over the distribution of stimuli but the ergodicityof the stimulus justies replacing this ensemble average with a time averageFor a deterministic system like the HH equations the spike rate is a singularfunction of time given the inputs It spikes occur at denite times with norandomness or irreproducibility If we observe these responses with a timeresolution 1t then for 1t sufciently small the rate rt at any time t eitheris zero or corresponds to a single spike occurring in one bin of size 1t thatis r D 1=1t Thus the information carried by a single spike is

Ione spike D iexcl log2 Nr1t (33)

On the other hand if the probability of spiking really depends on only thestimulus dimensions s1 s2 sK we can substitute

rtNr

PEs j spike at tPEs

(34)

Replacing the time averages in equation 32 with ensemble averages wend

I[EsI t0]acute IEsone spike D

ZdKsPEs j spike at t log2

microPEs j spike at t

PEs

para(35)

(for details of these arguments see Brenner Strong et al 2000) This al-lows us to compare the information captured by the K-dimensional reducedmodel with the true information carried by single spikes in the spike train

For reasons that we will discuss in the following section and as waspointed out in Aguera y Arcas et al (2001) and Aguera y Arcas and Fairhall(2003) we will be considering isolated spikesmdashthose separated from pre-vious spikes by a period of silence This has important consequences forour analysis Most signicantly as we will be considering spikes that occur

1724 B Aguera y Arcas A Fairhall and W Bialek

on a background of silence the relevant stimulus ensemble conditionedon the silence is no longer gaussian Further we will need to rene ourinformation estimate

The derivation of equation 32 makes clear that a similar formula mustdetermine the information carried by the occurrence time of any event notjust single spikes we can dene an event rate in place of the spike rate andthen calculate the information carried by these events (Brenner Strong etal 2000) In the case here we wish to compute the information obtainedby observing an isolated spike or equivalently by the event silence+spikeThis is straightforward we replace the spike rate by the rate of isolatedspikes and equation 32 will give us the information carried by the arrivaltime of a single isolated spike The problem is that this information includesboth the information carried by the occurrence of the spike and the infor-mation conveyed in the condition that there were no spikes in the precedingtsilence msec (for an early discussion of the information carried by silencesee de Ruyter van Steveninck amp Bialek 1988) We would like to separatethese contributions since our idea of dimensionality reduction applies onlyto the triggering of a spike not to the temporally extended condition ofnonspiking

To separate the information carried by the isolated spike itself we haveto ask how much information we gain by seeing an isolated spike given thatthe condition for isolation has already been met As discussed by BrennerStrong et al (2000) we can compute this information by thinking about thedistribution of times at which the isolated spike can occur Given that weknow the input stimulus the distribution of times at which a single isolatedspike will be observed is proportional to risot the time-dependent rate orperistimulus time histogram for isolated spikes With propernormalizationwe have

Pisot j inputs D1T

cent1

Nrisorisot (36)

where T is duration of the (long) window in which we can look for thespike and Nriso is the average rate of isolated spikes This distribution has anentropy

Sisot j inputs D iexclZ T

0dt Pisot j inputs log2 Pisot j inputs (37)

D iexcl1T

Z T

0dt

risotNriso

log2

micro1T

cent risotNriso

para(38)

D log2TNriso1t bits (39)

where again we use the fact that for a deterministic system the time-dependent rate must be either zero or the maximum allowed by our time

Computation in a Single Neuron 1725

resolution 1t To compute the information carried by a single spike weneed to compare this entropy with the total entropy possible when we donot know the inputs

It is tempting to think that without knowledge of the inputs an isolatedspike is equally likely to occur anywhere in the window of size T whichleads us back to equation 33 with Nr replaced by Nriso In this case howeverwe are assuming that the condition for isolation has already been met Thuseven without observing the inputs we know that isolated spikes can occuronly in windows of time whose total length is Tsilence D T cent Psilence wherePsilence is the probability that any moment in time is at least tsilence after themost recent spike Thus the total entropy of isolated spike arrival times(given that the condition for silence has been met) is reduced from log2 T to

Sisot j silence D log2T cent Psilence (310)

and the information that the spike carries beyond what we know from thesilence itself is

1Iiso spike D Sisot j silence iexcl Sisot j inputs (311)

D1T

Z T

0dt

risotNriso

log2

microrisot

Nrisocent Psilence

para(312)

D iexcl log2Nriso1t C log2 Psilence bits (313)

This information which is dened independent of any model for the fea-ture selectivity of the neuron provides the benchmark against which ourreduction of dimensionality will be measured To make the comparisonhowever we need the analog of equation 35

Equation 312 provides us with an expression for the information con-veyed by isolated spikes in terms of the probability that these spikes occurat particular times this is analogous to equation 32 for single (nonisolated)spikes If we follow a path analogous to that which leads from equation 32to equation 35 we nd an expression for the information that an isolatedspike provides about the K stimulus dimensions Es

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para

C hlog2 Psilence j Esi (314)

where the prior is now also conditioned on silence PEs j silence is thedistribution of Es given that Es is preceded by a silence of at least tsilence Noticethat this silence-conditioned distribution is not knowable a priori and inparticular it is not gaussian PEs j silence must be sampled from data

The last term in equation 314 is the entropy of a binary variable thatindicates whether particular moments in time are silent given knowledge

1726 B Aguera y Arcas A Fairhall and W Bialek

of the stimulus Again since the HH model is deterministic this conditionalentropy should be zero if we keep a complete description of the stimulusIn fact we are not interested in describing those features of the stimulusthat lead to silence and it is not fair (as we will see) to judge the success ofdimensionality reduction by looking at the prediction of silence which nec-essarily involves multiple dimensions To make a meaningful comparisonthen we will assume that there is a perfect description of the stimulus con-ditions leading to silence and focus on the stimulus features that trigger theisolated spike When we approximate these features by the K-dimensionalspace Es we capture an amount of information

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para (315)

This is the information that we can compare with 1Iiso spike in equation 313to determine the efciency of our dimensionality reduction

4 Characterizing the Hodgkin-Huxley Neuron

For completeness we begin with a brief review of the dynamics of thespace-clamped HHneuron (Hodgkin amp Huxley 1952) Hodgkin and Huxleymodeled the dynamics of the current through a patch of membrane withion-specic conductances

CdVdt

D It iexcl NgKn4V iexcl VK iexcl NgNam3hV iexcl VNa iexcl NglV iexcl Vl (41)

where It is injected current K and Na subscripts denote potassiumndash andsodiumndashrelated variables respectively and l (for ldquoleakagerdquo) terms includeall other ion conductances with slower dynamics C is the membrane ca-pacitance VK and VNa are ion-specic reversal potentials and Vl is denedsuch that the total voltage V is exactly zero when the membrane is at restNgK NgNa and Ngl are empirically determined maximal conductances for thedifferent ion species and the gating variables n m and h (on the interval[0 1]) have their own voltage-dependent dynamics

dn=dt D 001V C 011 iexcl n expiexcl01V iexcl 0125n expV=80

dm=dt D 01V C 251 iexcl m expiexcl01V iexcl 15 iexcl 4m expV=18

dh=dt D 0071 iexcl h exp005V iexcl h expiexcl01V iexcl 4 (42)

We have used the original values for these parameters except for changingthe signs of the voltages to correspond to the modern sign convention C D 1sup1Fcm2 NgK D 36 mScm2 NgNa D 120 mScm2 Ngl D 03 mScm2 VK D iexcl12mV VNa D C115 mV Vl D C10613 mV We have taken our system to be

Computation in a Single Neuron 1727

a frac14 pound 302 sup1m2 patch of membrane We solve these equations numericallyusing fourth-order RungendashKutta integration

The system is driven with a gaussian random noise current It gener-ated by smoothing a gaussian random number stream with an exponentiallter to generate a correlation time iquest It is convenient to choose iquest to be longerthan the time steps of numerical integration since this guarantees that allfunctions are smooth on the scale of single time steps Here we will alwaysuse iquest D 02 msec a value that is both less than the timescale over whichwe discretize the stimulus for analysis and far less than the neuronrsquos ca-pacitative smoothing timescale RC raquo 3 msec It has a standard deviationfrac34 but since the correlation time is short the relevant parameter usually isthe spectral density S D frac34 2iquest we also add a DC offset I0 In the followingwe will consider two parameter regimes I0 D 0 and I0 a nite value whichleads to more periodic ring

The integration step size is xed at 005 msec The key numerical exper-iments were repeated at a step size of 001 msec with identical results Thetime of a spike is dened as the moment of maximum voltage for voltagesexceeding a threshold (see Figure 1) estimated to subsample precision byquadratic interpolation As spikes are both very stereotyped and very largecompared to subspiking uctuations the precise value of this threshold isunimportant we have used C20 mV

41 Qualitative Description of Spiking The rst step in our analysis isto use reverse correlation equation 24 to determine the average stimulusfeature preceding a spike the STA In Figure 1(top) we display the STAin a regime where the spectral density of the input current is 65 pound 10iexcl4

nA2 msec The spike-triggered averages of the gating terms n4 (proportionof open potassium channels) and m3h (proportion of open sodium chan-nels) and the membrane voltage V are plotted in Figure 1 (middle and bot-tom) The error bars mark the standard deviation of the trajectories of thesevariables

As expected the voltage and gating variables follow highly stereotypedtrajectories during the raquo5 msec surrounding a spike First the rapid open-ing of the sodium channels causes a sharp membrane depolarization (orrise in V) the slower potassium channels then open and repolarize themembrane leaving it at a slightly lower potential than rest The potassiumchannels close gradually but meanwhile the membrane remains hyperpo-larized and due to its increased permeability to potassium ions at lowerresistance These effects make it difcult to induce a second spike duringthis raquo15 msec ldquorefractory periodrdquo Away from spikes the resting levels anductuations of the voltage and gating variables are quite small The largervalues evident in Figure 1(middle and bottom) by sect15 msec are due to thesummed contributions of nearby spikes

The spike-triggered average current has a largely transient form so thatspikes are on average preceded by an upward swing in current On the

1728 B Aguera y Arcas A Fairhall and W Bialek

Figure 1 Spike-triggered averages with standard deviations for (top) the inputcurrent I (middle) the fraction of open KC and NaC channels and (bottom) themembrane voltage V for the parameter regime I0 D 0 and S D 650 pound 10iexcl4 nA2

sec

other hand there is no obvious bottleneck in the current trajectories sothat the current variance is almost constant throughout the spike This isqualitatively consistent with the idea of dimensionality reduction if theneuron ignores most of the dimensions along which the current can varythen the variance which is shared almost equally among all dimensions forthis near white noise can change by only a small amount

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 10: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1724 B Aguera y Arcas A Fairhall and W Bialek

on a background of silence the relevant stimulus ensemble conditionedon the silence is no longer gaussian Further we will need to rene ourinformation estimate

The derivation of equation 32 makes clear that a similar formula mustdetermine the information carried by the occurrence time of any event notjust single spikes we can dene an event rate in place of the spike rate andthen calculate the information carried by these events (Brenner Strong etal 2000) In the case here we wish to compute the information obtainedby observing an isolated spike or equivalently by the event silence+spikeThis is straightforward we replace the spike rate by the rate of isolatedspikes and equation 32 will give us the information carried by the arrivaltime of a single isolated spike The problem is that this information includesboth the information carried by the occurrence of the spike and the infor-mation conveyed in the condition that there were no spikes in the precedingtsilence msec (for an early discussion of the information carried by silencesee de Ruyter van Steveninck amp Bialek 1988) We would like to separatethese contributions since our idea of dimensionality reduction applies onlyto the triggering of a spike not to the temporally extended condition ofnonspiking

To separate the information carried by the isolated spike itself we haveto ask how much information we gain by seeing an isolated spike given thatthe condition for isolation has already been met As discussed by BrennerStrong et al (2000) we can compute this information by thinking about thedistribution of times at which the isolated spike can occur Given that weknow the input stimulus the distribution of times at which a single isolatedspike will be observed is proportional to risot the time-dependent rate orperistimulus time histogram for isolated spikes With propernormalizationwe have

Pisot j inputs D1T

cent1

Nrisorisot (36)

where T is duration of the (long) window in which we can look for thespike and Nriso is the average rate of isolated spikes This distribution has anentropy

Sisot j inputs D iexclZ T

0dt Pisot j inputs log2 Pisot j inputs (37)

D iexcl1T

Z T

0dt

risotNriso

log2

micro1T

cent risotNriso

para(38)

D log2TNriso1t bits (39)

where again we use the fact that for a deterministic system the time-dependent rate must be either zero or the maximum allowed by our time

Computation in a Single Neuron 1725

resolution 1t To compute the information carried by a single spike weneed to compare this entropy with the total entropy possible when we donot know the inputs

It is tempting to think that without knowledge of the inputs an isolatedspike is equally likely to occur anywhere in the window of size T whichleads us back to equation 33 with Nr replaced by Nriso In this case howeverwe are assuming that the condition for isolation has already been met Thuseven without observing the inputs we know that isolated spikes can occuronly in windows of time whose total length is Tsilence D T cent Psilence wherePsilence is the probability that any moment in time is at least tsilence after themost recent spike Thus the total entropy of isolated spike arrival times(given that the condition for silence has been met) is reduced from log2 T to

Sisot j silence D log2T cent Psilence (310)

and the information that the spike carries beyond what we know from thesilence itself is

1Iiso spike D Sisot j silence iexcl Sisot j inputs (311)

D1T

Z T

0dt

risotNriso

log2

microrisot

Nrisocent Psilence

para(312)

D iexcl log2Nriso1t C log2 Psilence bits (313)

This information which is dened independent of any model for the fea-ture selectivity of the neuron provides the benchmark against which ourreduction of dimensionality will be measured To make the comparisonhowever we need the analog of equation 35

Equation 312 provides us with an expression for the information con-veyed by isolated spikes in terms of the probability that these spikes occurat particular times this is analogous to equation 32 for single (nonisolated)spikes If we follow a path analogous to that which leads from equation 32to equation 35 we nd an expression for the information that an isolatedspike provides about the K stimulus dimensions Es

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para

C hlog2 Psilence j Esi (314)

where the prior is now also conditioned on silence PEs j silence is thedistribution of Es given that Es is preceded by a silence of at least tsilence Noticethat this silence-conditioned distribution is not knowable a priori and inparticular it is not gaussian PEs j silence must be sampled from data

The last term in equation 314 is the entropy of a binary variable thatindicates whether particular moments in time are silent given knowledge

1726 B Aguera y Arcas A Fairhall and W Bialek

of the stimulus Again since the HH model is deterministic this conditionalentropy should be zero if we keep a complete description of the stimulusIn fact we are not interested in describing those features of the stimulusthat lead to silence and it is not fair (as we will see) to judge the success ofdimensionality reduction by looking at the prediction of silence which nec-essarily involves multiple dimensions To make a meaningful comparisonthen we will assume that there is a perfect description of the stimulus con-ditions leading to silence and focus on the stimulus features that trigger theisolated spike When we approximate these features by the K-dimensionalspace Es we capture an amount of information

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para (315)

This is the information that we can compare with 1Iiso spike in equation 313to determine the efciency of our dimensionality reduction

4 Characterizing the Hodgkin-Huxley Neuron

For completeness we begin with a brief review of the dynamics of thespace-clamped HHneuron (Hodgkin amp Huxley 1952) Hodgkin and Huxleymodeled the dynamics of the current through a patch of membrane withion-specic conductances

CdVdt

D It iexcl NgKn4V iexcl VK iexcl NgNam3hV iexcl VNa iexcl NglV iexcl Vl (41)

where It is injected current K and Na subscripts denote potassiumndash andsodiumndashrelated variables respectively and l (for ldquoleakagerdquo) terms includeall other ion conductances with slower dynamics C is the membrane ca-pacitance VK and VNa are ion-specic reversal potentials and Vl is denedsuch that the total voltage V is exactly zero when the membrane is at restNgK NgNa and Ngl are empirically determined maximal conductances for thedifferent ion species and the gating variables n m and h (on the interval[0 1]) have their own voltage-dependent dynamics

dn=dt D 001V C 011 iexcl n expiexcl01V iexcl 0125n expV=80

dm=dt D 01V C 251 iexcl m expiexcl01V iexcl 15 iexcl 4m expV=18

dh=dt D 0071 iexcl h exp005V iexcl h expiexcl01V iexcl 4 (42)

We have used the original values for these parameters except for changingthe signs of the voltages to correspond to the modern sign convention C D 1sup1Fcm2 NgK D 36 mScm2 NgNa D 120 mScm2 Ngl D 03 mScm2 VK D iexcl12mV VNa D C115 mV Vl D C10613 mV We have taken our system to be

Computation in a Single Neuron 1727

a frac14 pound 302 sup1m2 patch of membrane We solve these equations numericallyusing fourth-order RungendashKutta integration

The system is driven with a gaussian random noise current It gener-ated by smoothing a gaussian random number stream with an exponentiallter to generate a correlation time iquest It is convenient to choose iquest to be longerthan the time steps of numerical integration since this guarantees that allfunctions are smooth on the scale of single time steps Here we will alwaysuse iquest D 02 msec a value that is both less than the timescale over whichwe discretize the stimulus for analysis and far less than the neuronrsquos ca-pacitative smoothing timescale RC raquo 3 msec It has a standard deviationfrac34 but since the correlation time is short the relevant parameter usually isthe spectral density S D frac34 2iquest we also add a DC offset I0 In the followingwe will consider two parameter regimes I0 D 0 and I0 a nite value whichleads to more periodic ring

The integration step size is xed at 005 msec The key numerical exper-iments were repeated at a step size of 001 msec with identical results Thetime of a spike is dened as the moment of maximum voltage for voltagesexceeding a threshold (see Figure 1) estimated to subsample precision byquadratic interpolation As spikes are both very stereotyped and very largecompared to subspiking uctuations the precise value of this threshold isunimportant we have used C20 mV

41 Qualitative Description of Spiking The rst step in our analysis isto use reverse correlation equation 24 to determine the average stimulusfeature preceding a spike the STA In Figure 1(top) we display the STAin a regime where the spectral density of the input current is 65 pound 10iexcl4

nA2 msec The spike-triggered averages of the gating terms n4 (proportionof open potassium channels) and m3h (proportion of open sodium chan-nels) and the membrane voltage V are plotted in Figure 1 (middle and bot-tom) The error bars mark the standard deviation of the trajectories of thesevariables

As expected the voltage and gating variables follow highly stereotypedtrajectories during the raquo5 msec surrounding a spike First the rapid open-ing of the sodium channels causes a sharp membrane depolarization (orrise in V) the slower potassium channels then open and repolarize themembrane leaving it at a slightly lower potential than rest The potassiumchannels close gradually but meanwhile the membrane remains hyperpo-larized and due to its increased permeability to potassium ions at lowerresistance These effects make it difcult to induce a second spike duringthis raquo15 msec ldquorefractory periodrdquo Away from spikes the resting levels anductuations of the voltage and gating variables are quite small The largervalues evident in Figure 1(middle and bottom) by sect15 msec are due to thesummed contributions of nearby spikes

The spike-triggered average current has a largely transient form so thatspikes are on average preceded by an upward swing in current On the

1728 B Aguera y Arcas A Fairhall and W Bialek

Figure 1 Spike-triggered averages with standard deviations for (top) the inputcurrent I (middle) the fraction of open KC and NaC channels and (bottom) themembrane voltage V for the parameter regime I0 D 0 and S D 650 pound 10iexcl4 nA2

sec

other hand there is no obvious bottleneck in the current trajectories sothat the current variance is almost constant throughout the spike This isqualitatively consistent with the idea of dimensionality reduction if theneuron ignores most of the dimensions along which the current can varythen the variance which is shared almost equally among all dimensions forthis near white noise can change by only a small amount

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 11: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1725

resolution 1t To compute the information carried by a single spike weneed to compare this entropy with the total entropy possible when we donot know the inputs

It is tempting to think that without knowledge of the inputs an isolatedspike is equally likely to occur anywhere in the window of size T whichleads us back to equation 33 with Nr replaced by Nriso In this case howeverwe are assuming that the condition for isolation has already been met Thuseven without observing the inputs we know that isolated spikes can occuronly in windows of time whose total length is Tsilence D T cent Psilence wherePsilence is the probability that any moment in time is at least tsilence after themost recent spike Thus the total entropy of isolated spike arrival times(given that the condition for silence has been met) is reduced from log2 T to

Sisot j silence D log2T cent Psilence (310)

and the information that the spike carries beyond what we know from thesilence itself is

1Iiso spike D Sisot j silence iexcl Sisot j inputs (311)

D1T

Z T

0dt

risotNriso

log2

microrisot

Nrisocent Psilence

para(312)

D iexcl log2Nriso1t C log2 Psilence bits (313)

This information which is dened independent of any model for the fea-ture selectivity of the neuron provides the benchmark against which ourreduction of dimensionality will be measured To make the comparisonhowever we need the analog of equation 35

Equation 312 provides us with an expression for the information con-veyed by isolated spikes in terms of the probability that these spikes occurat particular times this is analogous to equation 32 for single (nonisolated)spikes If we follow a path analogous to that which leads from equation 32to equation 35 we nd an expression for the information that an isolatedspike provides about the K stimulus dimensions Es

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para

C hlog2 Psilence j Esi (314)

where the prior is now also conditioned on silence PEs j silence is thedistribution of Es given that Es is preceded by a silence of at least tsilence Noticethat this silence-conditioned distribution is not knowable a priori and inparticular it is not gaussian PEs j silence must be sampled from data

The last term in equation 314 is the entropy of a binary variable thatindicates whether particular moments in time are silent given knowledge

1726 B Aguera y Arcas A Fairhall and W Bialek

of the stimulus Again since the HH model is deterministic this conditionalentropy should be zero if we keep a complete description of the stimulusIn fact we are not interested in describing those features of the stimulusthat lead to silence and it is not fair (as we will see) to judge the success ofdimensionality reduction by looking at the prediction of silence which nec-essarily involves multiple dimensions To make a meaningful comparisonthen we will assume that there is a perfect description of the stimulus con-ditions leading to silence and focus on the stimulus features that trigger theisolated spike When we approximate these features by the K-dimensionalspace Es we capture an amount of information

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para (315)

This is the information that we can compare with 1Iiso spike in equation 313to determine the efciency of our dimensionality reduction

4 Characterizing the Hodgkin-Huxley Neuron

For completeness we begin with a brief review of the dynamics of thespace-clamped HHneuron (Hodgkin amp Huxley 1952) Hodgkin and Huxleymodeled the dynamics of the current through a patch of membrane withion-specic conductances

CdVdt

D It iexcl NgKn4V iexcl VK iexcl NgNam3hV iexcl VNa iexcl NglV iexcl Vl (41)

where It is injected current K and Na subscripts denote potassiumndash andsodiumndashrelated variables respectively and l (for ldquoleakagerdquo) terms includeall other ion conductances with slower dynamics C is the membrane ca-pacitance VK and VNa are ion-specic reversal potentials and Vl is denedsuch that the total voltage V is exactly zero when the membrane is at restNgK NgNa and Ngl are empirically determined maximal conductances for thedifferent ion species and the gating variables n m and h (on the interval[0 1]) have their own voltage-dependent dynamics

dn=dt D 001V C 011 iexcl n expiexcl01V iexcl 0125n expV=80

dm=dt D 01V C 251 iexcl m expiexcl01V iexcl 15 iexcl 4m expV=18

dh=dt D 0071 iexcl h exp005V iexcl h expiexcl01V iexcl 4 (42)

We have used the original values for these parameters except for changingthe signs of the voltages to correspond to the modern sign convention C D 1sup1Fcm2 NgK D 36 mScm2 NgNa D 120 mScm2 Ngl D 03 mScm2 VK D iexcl12mV VNa D C115 mV Vl D C10613 mV We have taken our system to be

Computation in a Single Neuron 1727

a frac14 pound 302 sup1m2 patch of membrane We solve these equations numericallyusing fourth-order RungendashKutta integration

The system is driven with a gaussian random noise current It gener-ated by smoothing a gaussian random number stream with an exponentiallter to generate a correlation time iquest It is convenient to choose iquest to be longerthan the time steps of numerical integration since this guarantees that allfunctions are smooth on the scale of single time steps Here we will alwaysuse iquest D 02 msec a value that is both less than the timescale over whichwe discretize the stimulus for analysis and far less than the neuronrsquos ca-pacitative smoothing timescale RC raquo 3 msec It has a standard deviationfrac34 but since the correlation time is short the relevant parameter usually isthe spectral density S D frac34 2iquest we also add a DC offset I0 In the followingwe will consider two parameter regimes I0 D 0 and I0 a nite value whichleads to more periodic ring

The integration step size is xed at 005 msec The key numerical exper-iments were repeated at a step size of 001 msec with identical results Thetime of a spike is dened as the moment of maximum voltage for voltagesexceeding a threshold (see Figure 1) estimated to subsample precision byquadratic interpolation As spikes are both very stereotyped and very largecompared to subspiking uctuations the precise value of this threshold isunimportant we have used C20 mV

41 Qualitative Description of Spiking The rst step in our analysis isto use reverse correlation equation 24 to determine the average stimulusfeature preceding a spike the STA In Figure 1(top) we display the STAin a regime where the spectral density of the input current is 65 pound 10iexcl4

nA2 msec The spike-triggered averages of the gating terms n4 (proportionof open potassium channels) and m3h (proportion of open sodium chan-nels) and the membrane voltage V are plotted in Figure 1 (middle and bot-tom) The error bars mark the standard deviation of the trajectories of thesevariables

As expected the voltage and gating variables follow highly stereotypedtrajectories during the raquo5 msec surrounding a spike First the rapid open-ing of the sodium channels causes a sharp membrane depolarization (orrise in V) the slower potassium channels then open and repolarize themembrane leaving it at a slightly lower potential than rest The potassiumchannels close gradually but meanwhile the membrane remains hyperpo-larized and due to its increased permeability to potassium ions at lowerresistance These effects make it difcult to induce a second spike duringthis raquo15 msec ldquorefractory periodrdquo Away from spikes the resting levels anductuations of the voltage and gating variables are quite small The largervalues evident in Figure 1(middle and bottom) by sect15 msec are due to thesummed contributions of nearby spikes

The spike-triggered average current has a largely transient form so thatspikes are on average preceded by an upward swing in current On the

1728 B Aguera y Arcas A Fairhall and W Bialek

Figure 1 Spike-triggered averages with standard deviations for (top) the inputcurrent I (middle) the fraction of open KC and NaC channels and (bottom) themembrane voltage V for the parameter regime I0 D 0 and S D 650 pound 10iexcl4 nA2

sec

other hand there is no obvious bottleneck in the current trajectories sothat the current variance is almost constant throughout the spike This isqualitatively consistent with the idea of dimensionality reduction if theneuron ignores most of the dimensions along which the current can varythen the variance which is shared almost equally among all dimensions forthis near white noise can change by only a small amount

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 12: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1726 B Aguera y Arcas A Fairhall and W Bialek

of the stimulus Again since the HH model is deterministic this conditionalentropy should be zero if we keep a complete description of the stimulusIn fact we are not interested in describing those features of the stimulusthat lead to silence and it is not fair (as we will see) to judge the success ofdimensionality reduction by looking at the prediction of silence which nec-essarily involves multiple dimensions To make a meaningful comparisonthen we will assume that there is a perfect description of the stimulus con-ditions leading to silence and focus on the stimulus features that trigger theisolated spike When we approximate these features by the K-dimensionalspace Es we capture an amount of information

1IEsiso spike D

ZdEs PEs j iso spike at t log2

microPEs j iso spike at t

PEs j silence

para (315)

This is the information that we can compare with 1Iiso spike in equation 313to determine the efciency of our dimensionality reduction

4 Characterizing the Hodgkin-Huxley Neuron

For completeness we begin with a brief review of the dynamics of thespace-clamped HHneuron (Hodgkin amp Huxley 1952) Hodgkin and Huxleymodeled the dynamics of the current through a patch of membrane withion-specic conductances

CdVdt

D It iexcl NgKn4V iexcl VK iexcl NgNam3hV iexcl VNa iexcl NglV iexcl Vl (41)

where It is injected current K and Na subscripts denote potassiumndash andsodiumndashrelated variables respectively and l (for ldquoleakagerdquo) terms includeall other ion conductances with slower dynamics C is the membrane ca-pacitance VK and VNa are ion-specic reversal potentials and Vl is denedsuch that the total voltage V is exactly zero when the membrane is at restNgK NgNa and Ngl are empirically determined maximal conductances for thedifferent ion species and the gating variables n m and h (on the interval[0 1]) have their own voltage-dependent dynamics

dn=dt D 001V C 011 iexcl n expiexcl01V iexcl 0125n expV=80

dm=dt D 01V C 251 iexcl m expiexcl01V iexcl 15 iexcl 4m expV=18

dh=dt D 0071 iexcl h exp005V iexcl h expiexcl01V iexcl 4 (42)

We have used the original values for these parameters except for changingthe signs of the voltages to correspond to the modern sign convention C D 1sup1Fcm2 NgK D 36 mScm2 NgNa D 120 mScm2 Ngl D 03 mScm2 VK D iexcl12mV VNa D C115 mV Vl D C10613 mV We have taken our system to be

Computation in a Single Neuron 1727

a frac14 pound 302 sup1m2 patch of membrane We solve these equations numericallyusing fourth-order RungendashKutta integration

The system is driven with a gaussian random noise current It gener-ated by smoothing a gaussian random number stream with an exponentiallter to generate a correlation time iquest It is convenient to choose iquest to be longerthan the time steps of numerical integration since this guarantees that allfunctions are smooth on the scale of single time steps Here we will alwaysuse iquest D 02 msec a value that is both less than the timescale over whichwe discretize the stimulus for analysis and far less than the neuronrsquos ca-pacitative smoothing timescale RC raquo 3 msec It has a standard deviationfrac34 but since the correlation time is short the relevant parameter usually isthe spectral density S D frac34 2iquest we also add a DC offset I0 In the followingwe will consider two parameter regimes I0 D 0 and I0 a nite value whichleads to more periodic ring

The integration step size is xed at 005 msec The key numerical exper-iments were repeated at a step size of 001 msec with identical results Thetime of a spike is dened as the moment of maximum voltage for voltagesexceeding a threshold (see Figure 1) estimated to subsample precision byquadratic interpolation As spikes are both very stereotyped and very largecompared to subspiking uctuations the precise value of this threshold isunimportant we have used C20 mV

41 Qualitative Description of Spiking The rst step in our analysis isto use reverse correlation equation 24 to determine the average stimulusfeature preceding a spike the STA In Figure 1(top) we display the STAin a regime where the spectral density of the input current is 65 pound 10iexcl4

nA2 msec The spike-triggered averages of the gating terms n4 (proportionof open potassium channels) and m3h (proportion of open sodium chan-nels) and the membrane voltage V are plotted in Figure 1 (middle and bot-tom) The error bars mark the standard deviation of the trajectories of thesevariables

As expected the voltage and gating variables follow highly stereotypedtrajectories during the raquo5 msec surrounding a spike First the rapid open-ing of the sodium channels causes a sharp membrane depolarization (orrise in V) the slower potassium channels then open and repolarize themembrane leaving it at a slightly lower potential than rest The potassiumchannels close gradually but meanwhile the membrane remains hyperpo-larized and due to its increased permeability to potassium ions at lowerresistance These effects make it difcult to induce a second spike duringthis raquo15 msec ldquorefractory periodrdquo Away from spikes the resting levels anductuations of the voltage and gating variables are quite small The largervalues evident in Figure 1(middle and bottom) by sect15 msec are due to thesummed contributions of nearby spikes

The spike-triggered average current has a largely transient form so thatspikes are on average preceded by an upward swing in current On the

1728 B Aguera y Arcas A Fairhall and W Bialek

Figure 1 Spike-triggered averages with standard deviations for (top) the inputcurrent I (middle) the fraction of open KC and NaC channels and (bottom) themembrane voltage V for the parameter regime I0 D 0 and S D 650 pound 10iexcl4 nA2

sec

other hand there is no obvious bottleneck in the current trajectories sothat the current variance is almost constant throughout the spike This isqualitatively consistent with the idea of dimensionality reduction if theneuron ignores most of the dimensions along which the current can varythen the variance which is shared almost equally among all dimensions forthis near white noise can change by only a small amount

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 13: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1727

a frac14 pound 302 sup1m2 patch of membrane We solve these equations numericallyusing fourth-order RungendashKutta integration

The system is driven with a gaussian random noise current It gener-ated by smoothing a gaussian random number stream with an exponentiallter to generate a correlation time iquest It is convenient to choose iquest to be longerthan the time steps of numerical integration since this guarantees that allfunctions are smooth on the scale of single time steps Here we will alwaysuse iquest D 02 msec a value that is both less than the timescale over whichwe discretize the stimulus for analysis and far less than the neuronrsquos ca-pacitative smoothing timescale RC raquo 3 msec It has a standard deviationfrac34 but since the correlation time is short the relevant parameter usually isthe spectral density S D frac34 2iquest we also add a DC offset I0 In the followingwe will consider two parameter regimes I0 D 0 and I0 a nite value whichleads to more periodic ring

The integration step size is xed at 005 msec The key numerical exper-iments were repeated at a step size of 001 msec with identical results Thetime of a spike is dened as the moment of maximum voltage for voltagesexceeding a threshold (see Figure 1) estimated to subsample precision byquadratic interpolation As spikes are both very stereotyped and very largecompared to subspiking uctuations the precise value of this threshold isunimportant we have used C20 mV

41 Qualitative Description of Spiking The rst step in our analysis isto use reverse correlation equation 24 to determine the average stimulusfeature preceding a spike the STA In Figure 1(top) we display the STAin a regime where the spectral density of the input current is 65 pound 10iexcl4

nA2 msec The spike-triggered averages of the gating terms n4 (proportionof open potassium channels) and m3h (proportion of open sodium chan-nels) and the membrane voltage V are plotted in Figure 1 (middle and bot-tom) The error bars mark the standard deviation of the trajectories of thesevariables

As expected the voltage and gating variables follow highly stereotypedtrajectories during the raquo5 msec surrounding a spike First the rapid open-ing of the sodium channels causes a sharp membrane depolarization (orrise in V) the slower potassium channels then open and repolarize themembrane leaving it at a slightly lower potential than rest The potassiumchannels close gradually but meanwhile the membrane remains hyperpo-larized and due to its increased permeability to potassium ions at lowerresistance These effects make it difcult to induce a second spike duringthis raquo15 msec ldquorefractory periodrdquo Away from spikes the resting levels anductuations of the voltage and gating variables are quite small The largervalues evident in Figure 1(middle and bottom) by sect15 msec are due to thesummed contributions of nearby spikes

The spike-triggered average current has a largely transient form so thatspikes are on average preceded by an upward swing in current On the

1728 B Aguera y Arcas A Fairhall and W Bialek

Figure 1 Spike-triggered averages with standard deviations for (top) the inputcurrent I (middle) the fraction of open KC and NaC channels and (bottom) themembrane voltage V for the parameter regime I0 D 0 and S D 650 pound 10iexcl4 nA2

sec

other hand there is no obvious bottleneck in the current trajectories sothat the current variance is almost constant throughout the spike This isqualitatively consistent with the idea of dimensionality reduction if theneuron ignores most of the dimensions along which the current can varythen the variance which is shared almost equally among all dimensions forthis near white noise can change by only a small amount

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 14: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1728 B Aguera y Arcas A Fairhall and W Bialek

Figure 1 Spike-triggered averages with standard deviations for (top) the inputcurrent I (middle) the fraction of open KC and NaC channels and (bottom) themembrane voltage V for the parameter regime I0 D 0 and S D 650 pound 10iexcl4 nA2

sec

other hand there is no obvious bottleneck in the current trajectories sothat the current variance is almost constant throughout the spike This isqualitatively consistent with the idea of dimensionality reduction if theneuron ignores most of the dimensions along which the current can varythen the variance which is shared almost equally among all dimensions forthis near white noise can change by only a small amount

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 15: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1729

42 Interspike Interaction Although the STA has the form of a differ-entiating kernel suggesting that the neuron detects edge-like events in thecurrent versus time there must be a DC component to the cellrsquos response Werecall that for constant inputs the HH model undergoes a bifurcation to con-stant frequency spiking where the frequency is a function of the value of theinput above onset Correspondingly the STA does not sum precisely to zeroone might think of it as having a small integrating component that allowsthe system to spike under DC stimulation albeit only above a threshold

The systemrsquos tendency to periodic spiking under DC current input alsois felt under dynamic stimulus conditions and can be thought of as a stronginteraction between successive spikes We illustrate this by considering adifferent parameter regime with a small DC current and some added noise(I0 D 011 nA and S D 08pound10iexcl4 nA2 sec) Note that the DC component putsthe neuron in the metastable region of its f iexcl I curve (see Figure 2) In thisregime the neuron tends to re quasi-regular trains of spikes intermittentlyas shown in Figure 3 We will refer to these quasi-regular spike sequencesas ldquoburstsrdquo (note that this term is often used to refer to compound spikesin neurons with additional channels such events do not occur in the HHmodel)

Spikes can be classied into three types those initiating a spike burstthose within a burst and those ending a burst The minimum length of

Figure 2 Firing rate of the HH neuron as a function of injected DC currentThe empty circles at moderate currents denote the metastable region where theneuron may be either spiking or silent

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 16: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1730 B Aguera y Arcas A Fairhall and W Bialek

Figure 3 Segment of a typical spike train in a ldquoburstingrdquo regime

Figure 4 Spike-triggered averages derived from spikes leading (ldquoonrdquo) inside(ldquoburstrdquo) and ending (ldquooffrdquo) a burst The parameters of this bursting regimeare I0 D 011 nA and S D 08 pound 10iexcl4 nA2 sec Note that the burst-ending spikeaverage is by construction identical to that of any other within-burst spike fort lt 0

the silence between bursts is taken in this case to be 70 msec Taking thesethree categories of spike as different ldquosymbolsrdquo (de Ruyter van Steveninckamp Bialek 1988) we can determine the average stimulus for each These areshown in Figure 4 with the spike at t D 0

In this regime the initial spike of a burst is preceded by a rapid oscillationin the current Spikes within a burst are affected much less by the currentthe feature immediately preceding such spikes is similar in shape to a singleldquowavelengthrdquo of the leading spike feature but is of much smaller amplitudeand is temporally compressed into the interspike interval Hence althoughit is clear that the timing of a spike within a burst is determined largely bythe timing of the previous spike the current plays some role in affecting theprecise placement This also demonstrates that the shape of the STA is notthe same for all spikes it depends strongly and nontrivially on the time tothe previous spike and this is related to the observation that subtly differentpatterns of two or three spikes correspond to very different average stimuli(de Ruyter van Steveninck amp Bialek 1988) For a reader of the spike codea spike within a burst conveys a different message about the input thanthe spike at the onset of the burst Finally the feature ending a burst has avery similar form to the onset feature but reversed in time Thus to a goodapproximation the absence of a spike at the end of a burst can be read asthe opposite of the onset of the burst

In summary this regime of the HH neuron is similar to a ldquoip-oprdquo or1-bit memory Like its electronic analog the neuronrsquos memory is preserved

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 17: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1731

Figure 5 Overall spike-triggered average in the bursty regime showing theringing due to the tendency to periodic ring Plotted in gray is the spike auto-correlation showing the same oscillations

by a feedback loop here implemented by the interspike interaction Largeuctuations in the input current at a certain frequency ldquoiprdquo or ldquooprdquo theneuron between its silent and spiking states However while the neuron isspiking further details of the input signal are transmitted by precise spiketiming within a burst If we calculate the spike-triggered average of allspikes for this regime without regard to their position within a burst thenas shown in Figure 5 the relatively well-localized leading spike oscillationof Figure 4 is replaced by a long-lived oscillating function resulting from thespike periodicityThis is shown explicitlyby comparingthe overall STAwiththe spike autocorrelation also shown in Figure 5 This same effect is seenin the STA of the burst spikes which in fact dominates the overall averagePrediction of spike timing using such an STA would be computationallydifcult due to its extension in time but more seriously unsuccessful asmost of the function is an artifact of the spike history rather than the effectof the stimulus

While the effects of spike interaction are interesting and should be in-cluded in a complete model for spike generation we wish here to consideronly the currentrsquos role in initiating spikes Therefore as we have arguedelsewhere we limit ourselves initially to the cases in which interspike inter-action plays no role (Aguera y Arcas et al 2001 Aguera amp Fairhall 2003)These ldquoisolatedrdquo spikes can be dened as spikes preceded by a silent periodtsilence long enough to ensure decoupling from the timing of the previousspike A reasonable choice for tsilence can be inferred directly from the inter-

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 18: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1732 B Aguera y Arcas A Fairhall and W Bialek

Figure 6 Hodgkin-Huxley interspike interval histogram for the parameters I0 D0 and S D 65 pound 10iexcl4 nA2 sec showing a peak at a preferred ring frequencyand the long Poisson tail The total number of spikes is N D 518 pound 106 The plotto the right is a closeup in linear scale

spike interval distribution P1t illustrated in Figure 6 For the HH modelas in simpler models and many real neurons (Brenner Agam Bialek amp deRuyter van Steveninck 1998) the form of P1t has three noteworthy fea-tures a refractory ldquoholerdquo during which another spike is unlikely to occur astrong mode at the preferred ring frequency and an exponentially decay-ing or Poisson tail The details of all three of these features are functions ofthe parameters of the stimulus (Tiesinga Jose amp Sejnowski 2000) and cer-tain regimes may be dominated by only one or two features The emergenceof Poisson statistics in the tail of the distribution implies that these eventsare independent so we can infer that the system has lost memory of theprevious spike We will therefore take isolated spikes to be those precededby a silent interval 1t cedil tsilence where tsilence is well into the Poisson regimeThe burst-onset spikes of Figure 4 are isolated spikes by this denition

Note that the bursty behavior evident in Figure 3 is characteristic of ldquotypeIIrdquo neurons which begin ring at a well-dened frequency under DC stimu-lus ldquotype Irdquo neurons by contrast can re at arbitrarily low frequency underconstant input Nonetheless the analysis that follows is equally applicableto either type of neuron spikes still interact at close range and become inde-pendent at sufcient separation In what follows we will be probing the HHneuron with current noise that remains below the DC threshold for periodicring

5 Isolated Spike Analysis

Focusing now on isolated spikes we proceed to a second-order analysisof the current uctuations around the isolated spike triggered average (see

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 19: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1733

Figure 7 Spike-triggered average stimulus for isolated spikes

Figure 7) We consider the response of the HH neuron to currents It withmean I0 D 0 and spectral density of S D 65 pound 10iexcl4 nA2 sec Isolated spikesin this regime are dened by tsilence D 60 msec

51 How Many Dimensions As explained in section 2 our path to di-mensionality reduction begins with the computation of covariance matricesfor stimulus uctuations surrounding a spike The matrices are accumulatedfrom stimulus segments 200 samples in length roughly corresponding tosampling at the timescale sufciently long to capture the relevant featuresThus we begin in a 200-dimensional space We emphasize that the theoremthat connects eigenvalues of the matrix 1C to the number of relevant di-mensions is valid only for truly gaussian distributions of inputs and that byfocusing on isolated spikes we are essentially creating a nongaussian stim-ulus ensemblemdashnamely those stimuli that generate the silence out of whichthe isolated spike can appear Thus we expect that the covariance matrixapproach will give us a heuristic guide to our search for lower-dimensionaldescriptions but we should proceed with caution

The ldquorawrdquo isolated spike-triggered covariance Ciso spike and the corre-sponding covariance difference 1C equation 27 are shown in Figure 8 Thematrix shows the effect of the silence as an approximately translationallyinvariant band preceding the spike the second-order analog of the constantnegative bias in the isolated spike STA (see Figure 7) The spike itself isassociated with features localized to sect15 msec In Figure 9 we show thespectrum of eigenvalues of 1C computed using a sample of 80000 spikesBefore calculating the spectrum we multiply 1C by Ciexcl1

prior This has the ef-fect of giving us eigenvalues scaled in units of the input standard deviation

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 20: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1734 B Aguera y Arcas A Fairhall and W Bialek

Figure 8 The isolated spike-triggered covariance Ciso (left) and covariance dif-ference 1C (right) for times iexcl30 lt t lt 5 msec The plots are in units of nA2

Figure 9 The leading 64 eigenvalues of the isolated spike-triggered covarianceafter accumulating 80000 spikes

along each dimension Because the correlation time is short Cprior is nearlydiagonal

While the eigenvalues decay rapidly there is no obvious set of outstand-ing eigenvalues To verify that this is not an effect of nite sampling Fig-ure 10 shows the spectrum of eigenvalue magnitudes as a function of samplesize N Eigenvalues that are truly zero up to the noise oor determined by

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 21: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1735

Figure 10 Convergence of the leading 64 eigenvalues of the isolated spike-triggered covariance with increasing sample size The log slope of the diagonalis 1=

pnspikes Positive eigenvalues are indicated by crosses and negative by dots

The spike-associated modes are labeled with an asterisk

sampling decrease likep

N We nd that a sequence of eigenvalues emergesstably from the noise

These results do not however imply that a low-dimensional approxima-tion cannot be identied The extended structure in the covariance matrixinduced by the silence requirement is responsible for the apparent high di-mensionality In fact as has been shown in Aguera y Arcas and Fairhall(2003) the covariance eigensystem includes modes that are local and spikeassociated and others that are extended and silence associated and thusirrelevant to a causal model of spike timing prediction Fortunately becauseextended silences and spikes are (by denition) statistically independentthere is no mixing between the two types of modes To identify the spike-associated modes we follow the diagnostic of Aguera y Arcas and Fairhall(2003) computing the fraction of the energy of each mode concentratedin the period of silence which we take to be iexcl60 middot t middot iexcl40 msec Theenergy of a spike-associated mode in the silent period is due entirely tonoise and will therefore decrease like 1=nspikes with increasing sample sizewhile this energy remains of order unity for silence modes Carrying outthe test on the covariance modes we obtain Figure 11 which shows thatthe rst and fourth modes rapidly emerge as spike associated Two furtherspike-associated modes appear over the sample shown with the suggestion

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 22: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1736 B Aguera y Arcas A Fairhall and W Bialek

Figure 11 For the leading 64 modes fraction of the mode energy over the in-terval iexcl40 lt t lt iexcl30 msec as a function of increasing sample size Modesemerging with low energy are spike associated The symbols indicate the signof the eigenvalue

of other weaker modes yet to emerge The two leading silence modes areshown in Figure 12 Those shown are typical most modes resemble Fouriermodes as the silence condition is close to time translationally invariant

Examining the eigenvectors corresponding to the two leading spike-associated eigenvalues which for convenience we will denote s1 and s2(although they are not the leading modes of the matrix) we nd (see Fig-ure 13) that the rst mode closely resembles the isolated spike STA and thesecond is close to the derivative of the rst Both modes approximate dif-ferentiating operators there is no linear combination of these modes thatwould produce an integrator

If the neuron ltered its input and generated a spike when the outputof the lter crosses threshold we would nd two signicant dimensionsassociated with a spike The rst dimension would correspond simply tothe lter as the variance in this dimension is reduced to zero (for a noiselesssystem) at the occurrence of a spike As the threshold is always crossed frombelow the stimulus projection onto the lterrsquos derivative must be positiveagain resulting in a reduced variance It is tempting to suggest then thatltered threshold crossing is a good approximation to the HH model butwe will see that this is not correct

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 23: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1737

Figure 12 Modes 2 and 3 of the spike-triggered covariance (silence associated)

Figure 13 Modes 1 and 4 of the spike-triggered covariance which are the lead-ing spike-associated modes

52 Evaluating the Nonlinearity At each instant of time we can ndthe projections of the stimulus along the leading spike-associated dimen-sions s1 and s2 By construction the distribution of these signals over thewhole experiment Ps1 s2 is gaussian The appropriate prior for the iso-lation condition Ps1 s2 j silence differs only subtly from the gaussianprior On the other hand for each spike we obtain a sample from the dis-

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 24: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1738 B Aguera y Arcas A Fairhall and W Bialek

Figure 14 104 spike-conditional stimuli (or ldquospike historiesrdquo) projected alongthe rst two covariance modes The axes are in units of standard deviation onthe prior gaussian distribution The circles from the inside out enclose all but10iexcl1 10iexcl2 10iexcl8 of the prior

tribution Ps1 s2 j iso spike at t0 leading to the picture in Figure 14 Theprior and spike-conditional distributions are clearly better separated in twodimensions than in one which means that the two-dimensional descriptioncaptures more information than projection onto the spike-triggered aver-age alone Surprisingly the spike-conditional distribution is curved unlikewhat we would expect for a simple thresholding device Furthermore theeigenvalue of 1C which we associate with the direction of threshold cross-ing (plotted on the y-axis in Figure 14) is positive indicating increasedrather than decreased variance in this direction As we see projections ontothis mode are almost equally likely to be positive or negative ruling out thethreshold crossing interpretation

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 25: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1739

Combining equations 22 and 23 for isolated spikes we have

gs1 s2 D Ps1 s2 j iso spike at t0

Ps1 s2 j silence (51)

so that these two distributions determine the input-output relation of theneuron in this 2D space (Brenner Bialek et al 2000) Recall that althoughthe subspace is linear g can have arbitrary nonlinearity Figure 14 shows thatthis input-output relation has clear structure but also some fuzziness As theHH model is deterministic the input-output relation should be a singularfunction in the continuous space of inputsmdashspikes occur only when certainexact conditions are met Of course nite time resolution introduces someblurring and so we need to understand whether the blurring of the input-output relation in Figure 14 is an effect of nite time resolution or a reallimitation of the 2D description

53 Information Captured in Two Dimensions We will measure theeffectiveness of our description by computing the information in the 2Dapproximation according to the methods described in section 3 If thetwo-dimensional approximation were exact we would nd that Is1s2

isospike DIiso spike more generally one nds Is1 s2

isospike middot Iisospike and the fraction ofthe information captured measures the quality of the approximation Thisfraction is plotted in Figure 15 as a function of time resolution For compar-ison we also show the information captured in the one-dimensional caseconsidering only the stimulus projection along the STA

We nd that our low-dimensional model captures a substantial fractionof the total information available in spike timing in an HH neuron overa range of time resolutions The approximation is best near 1t D 3 msec

Figure 15 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution capturedby projection onto the STA alone (triangles) and projection onto 1C covariancemodes 1 and 2 (circles)

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 26: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1740 B Aguera y Arcas A Fairhall and W Bialek

reaching 75 Thus the complex nonlinear dynamics of the HH model canbe approximated by saying that the neuron is sensitive to a 2D linear sub-space in the high-dimensional space of input signals and this approximatedescription captures up to 75 of the mutual information between inputcurrents and (isolated) spike arrival times

The dependence of information on time resolution (see Figure 15) showsthat the absolute information captured saturates for both the 1D and 2Dcases at frac14 32 and 41 bits respectively Hence for smaller 1t the informa-tion fraction captured drops The model provides at its best a time resolu-tionof 3 msec so that informationcarried by moreprecisespike timingis lostin our low-dimensional projection Might this missing information be im-portant for a real neuron Stochastic HH simulations with realistic channeldensities suggest that the timing of spikes in response to white noise stim-uli is reproducible to within 1 to 2 msec (Schneidman Freedman amp Segev1998) a gure that is comparable to what is observed for pyramidal cells invitro (Mainen amp Sejnowski 1995) as well in vivo in the yrsquos visual system(de Ruyter van Steveninck Lewen Strong Koberle amp Bialek 1997 LewenBialek amp de Ruyter van Steveninck 2001) the vertebrate retina (Berry War-land amp Meister 1997) the cat lateral geniculate nucleus (LGN) (Reinagel ampReid 2000) and the bat auditory cortex (Dear Simmons amp Fritz 1993) Thissuggests that such timing details may indeed be important We must there-fore ask why our approximation seems to carry an inherent time resolutionlimitation and why even at its optimal resolution the full information inthe spike is not recovered

For many purposes recovering 75 of the information at raquo 3 msecresolution might be considered a resounding success On the other handwith such a simpleunderlying model we would hope for a morecompellingconclusion From a methodological point of view it behooves us to askwhat we are missing in our 2D model and perhaps the methods we use innding the missing information in the present case will prove applicablemore generally

6 What Is Missing

The obvious rst approach to improving the 2D approximation is to addmore dimensions Let us consider the neglected modes We recall from Fig-ure 11 that in simulations with very large numbers of spikes we can isolateat least two more modes that have signicant eigenvalues and are asso-ciated with the isolated spike rather than the preceding silence these areshown in Figure 16 We see that these modes look like higher-order deriva-tives which makes some sense since we are missing information at hightime resolution On the other hand if all we are doing is expanding in abasis of higher-order derivatives it is not clear that we will do qualitativelybetter by including one or two more terms particularly given that samplinghigher-dimensional distributions becomes very difcult

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 27: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1741

Figure 16 The two next spike-associated modes These resemble higher-orderderivatives

Our original model attempted to approximate the set of relevant fea-tures as lying within a K-dimensional linear subspace of the original D-dimensional input space The covariance spectrum indicates that additionaldimensions play a small but signicant role We suggest that a reasonablenext step is to consider the relevant feature space to be low dimensional butnot at The rst two covariance modes dene a plane we will considernext a 2D geometric construction that curves into additional dimensions

Several methods have been proposed for the general problem of iden-tifying low-dimensional nonlinear manifolds (Cottrell Munro amp Zipser1988 Boser Guyon amp Vapnik 1992 Guyon Boser amp Vapnik 1993 Oja ampKarhunen 1995 Roweis amp Saul 2000) but these various approaches sharethe disadvantage that the manifold or equivalently the relevant set of fea-tures remains implicit Our hope is to understand the behavior of the neu-ron explicitly we therefore wish to obtain an explicit representation of thiscurved feature space in terms of the basis vectors (features) that span it

A rst approach to determining the curved subspace is to approximateit as a set of locally linear ldquotilesrdquo At any place on the surface we wish tond two orthogonal directions that form the surfacersquos local linear approx-imation Curvature of the subspace means that these two dimensions willvary across the surface We apply a simple algorithm to construct a locallylinear tiling with the advantage that it requires only rst-order statisticsFirst we take one of the directions to be globally dened it is natural totake this direction to be the overall spike-triggered average Allowing forcurvature in the feature subspace means that along the direction of the STAwe allow the second orthogonal direction to vary In principle this vari-

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 28: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1742 B Aguera y Arcas A Fairhall and W Bialek

Figure 17 The orthonormal components of spike-triggered averages from80000 spikes conditioned on their projection onto the overall spike-triggeredaverage (eight conditional averages shown)

ation is continuous but we will not be able to sample sufciently to ndthe complete description of the second direction so we sort the stimulushistories according to their projection along this direction and bin the sortedhistories into a small number of bins This denes the number of tiles usedto cover the surface We determine the conditional average of the stimulushistories in each bin and compute the (normalized) component orthogonalto the overall STA This provides a second locally meaningful basis vectorfor the subspace in that bin The resulting family of curves orthogonal to theSTA is shown in Figure 17 Applying singular value decomposition to thefamily of curves shows that there are at least four signicant independentdirections in stimulus space apart from the STA This gives us an estimateof the embedding dimension of the feature subspace

Geometrically this construction is equivalent to approximating the sur-face as a twisting ribbon with the leading direction that of the STA butwhere the surface is allowed to rotate about the STA axis Further we havediscretized the twisting direction into a small number of tiles The discretiza-tion is xed by the size of the data There is a trade-off between the precisionof estimating the conditional average and the delitywith whichone followsthe twist Here we have restricted ourselves to an experimentally realisticnumber of isolated spikes (80000) using an equal number of spikes perbin Varying over the number of bins we found that eight bins gave thebest result Note that our model is discontinuous it might be possible toimprove it by interpolating smoothly between successive tiles

Computing the information as a function of 1t using this locally linearmodel we obtain the curve shown in Figure 18 where the results can becompared against the information found from the STA alone and from thecovariance modes The information from the new model captures a maxi-

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 29: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1743

Figure 18 Bits per spike (left) and fraction of the theoretical limit (right) oftiming information in a single spike at a given temporal resolution captured bythe locally linear tiling ldquotwistrdquo model (diamonds) compared to models usingthe STA alone (triangles) and projection onto 1C covariance modes 1 and 2(circles)

mum of 48 bits recovering raquo 90 of the information at a time resolutionof approximately 1 msec

One of the main strengths of this simple approach is that we have suc-ceeded in extracting additional geometrical information about the featuresubspace using very limited data as we compute only averages Note thata similar number of spikes cannot resolve more than two spike-associatedcovariance modes in the covariance matrix analysis

7 Discussion

The HH equations describe the dynamics of four degrees of freedom andalmost since these equations were rst written down there have been at-tempts to nd simplications or reductions FitzHugh and Nagumo pro-posed a 2D system of equations that approximate the HH model (Fitzhugh1961 Nagumo Arimoto amp Yoshikawa 1962) and this has the advantagethat one can visualize the trajectories directly in the plane and thus achievean intuitive graphical understanding of the dynamics and its dependenceon parameters The need for reduction in the sense pioneered by FitzHughand by Nagumo et al has become only more urgent with the growing use ofincreasingly complexHH-style model neurons with many different channeltypes With this problem in mind Kepler Abbott and Marder have intro-duced reduction methods that are more systematic making use of the dif-ference in timescales among the gating variables (Kepler Abbott amp Marder1992 Abbott amp Kepler 1990)

In the presence of constant current inputs it makes sense to describethe HH equations as a 4D autonomous dynamical system by well-knownmethods in dynamical systems theory one could consider periodic input

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 30: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1744 B Aguera y Arcas A Fairhall and W Bialek

currents by adding an extra dimension The question asked by FitzHughand Nagumo was whether this 4D or 5D description could be reduced totwo or three dimensions

Closer in spirit to our approach is the work by Kistler Gerstner andvan Hemmen (1997) who focused in particular on the interaction amongsuccessive action potentials They argued that one could approximate theHH model by a nearly linear dynamical system with a threshold identi-fying threshold crossing with spiking provided that each spike generatedeither a change in threshold or an effective input current that inuences thegeneration of subsequent spikes

The notion of model dimensionality considered here is distinct from thedynamical systems perspective in which one simply counts the systemrsquosdegrees of freedom Here we are attempting to nd a description of the dy-namics which is essentially functional or computational We have identiedthe output of the system as spike times and our aim is to construct as com-plete a description as possible of the mapping between input and outputThe dimensionality of our model is that of the space of inputs relevant forthis mapping There is no necessary relationship between these two notionsof dimensionality For example in a neural network with two attractors asystem described by a potentially large number of variables there might bea simple rule (perhaps even a linear lter) that allows us to look at the in-puts to the network and determine the times at which the switching eventswill occur Conversely once we leave the simplied world of constant orperiodic inputs even the small number of differential equations describ-ing a neuronrsquos channel dynamics could in principle be equivalent to a verycomplicated set of rules for mapping inputs into spike times

In our context simplicity is (roughly) feature selectivity the mapping issimple if spiking is determined by a small number of features in the com-plex history of inputs Following the ideas that emerged in the analysis ofmotion-sensitive neurons in the y (de Ruyter van Steveninck amp Bialek1988 Bialek amp de Ruyter van Steveninck 2003) we have identied ldquofea-turesrdquo with ldquodimensionsrdquo and searched for low-dimensional descriptionsof the input history that preserve the mutual information between inputsand outputs (spike times) We have considered only the generation of iso-lated spikes leaving aside the question of how spikes interact with oneanother as considered by Kistler et al (1997) For these isolated spikes webegan by searching for projections onto a low-dimensional linear subspaceof the originally raquo 200-dimensional stimulus space and we found that asubstantial fraction of the mutual informationcould be preserved in a modelwith just two dimensions Searching for the information that is missing fromthis model we found that rather than adding more (Euclidean) dimensionswe could capture approximately 90 of the information at high time reso-lution by keeping a 2D description but allowing these dimensions to varyover the surface so that the neuron is sensitive to stimulus features that liein a curved 2D subspace

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 31: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1745

The geometrical picture of neurons as being sensitive to features that aredened by a low-dimensional stimulus subspace is attractive and as notedin section 1 corresponds to a widely shared intuition about the nature ofneuronal feature selectivity While curved subspaces often appear as thetargets for learning in complexneural computations such as invariant objectrecognition the idea that such subspaces appear already in the descriptionof single-neuron computation we believe to be novel

While we have exploited the fact that long simulations of the HH modelare quite tractable to generate large amounts of ldquodatardquo for our analysis it isimportant that in the end our construction of a curved relevant stimulussubspace involves a series of computations that are just simple general-izations of the conventional reverse correlation or spike-triggered averageThis suggests that our approach can be applied to real neurons withoutrequiring qualitatively larger data sets than might have been needed fora careful reverse correlation analysis In the same spirit recent work hasshown how covariance matrix analysis of the yrsquos motion-sensitive neu-rons can reveal nonlinear computations in a 4D subspace using data setsof fewer than 104 spikes (Bialek amp de Ruyter van Steveninck 2003) Low-dimensional linear subspaces can be found even in the response of modelneurons to naturalistic inputs if one searches directly for dimensions thatcapture the largest fraction of the mutual information between inputs andspikes (Sharpee et al in press) and again the errors involved in identifyingthe relevant dimensions are comparable to the errors in reverse correla-tion (Sharpee Rust amp Bialek 2003) All of these results point to the prac-tical feasibility of describing real neurons in terms of nonlinear computa-tion on low-dimensional relevant subspaces in a high-dimensional stimulusspace

Our reduced model of the HH neuron both illustrates a novel approachto dimensional reduction and gives new insight into the computation per-formed by the neuron The reduced model is essentially that of an edgedetector for current trajectories but is sensitive to a further stimulus pa-rameter producing a curved manifold An interpretation of this curva-ture will be presented in a forthcoming manuscript This curved repre-sentation is able to capture almost all information that isolated spikes con-vey about the stimulus or conversely allow us to predict isolated spiketimes with high temporal precision from the stimulus The emergence ofa low-dimensional curved manifold in a model as simple as the HH neu-ron suggests that such a description may also be appropriate for biologicalneurons

Our approach is limited in that we address only isolated spikes This re-stricted class of spikes nonetheless has biological relevance for example invertebrate retinal ganglion cells (Berry amp Meister 1999) in rat somatosen-sory cortex (Panzeri Petersen Schultz Lebedev amp Diamond 2001) and inLGN (Reinagel Godwin Sherman amp Koch 1999) the rst spike of a bursthas been shown to convey distinct (and the majority of the) information

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 32: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1746 B Aguera y Arcas A Fairhall and W Bialek

However a clear next step in this program is to extend our formalism to takeinto account interspike interaction For neurons or models with explicit longtimescales adaptation induces very long-range history dependence whichcomplicates the issue of spike interactions considerably A full understand-ing of the interaction between stimulus and spike history will therefore ingeneral involve understanding the meanings of spike patterns (de Ruytervan Steveninck amp Bialek 1988 Brenner Strong et al 2000) and the inu-ence of the larger statistical context (Fairhall et al 2001) Our results pointto the need for a more parsimonious description of self-excitation even forthe simple case of dependence on only the last spike time

We close by reminding readers of the more ambitious goal of buildingbridges between the burgeoning molecular-level description of neurons andthe functional or computational level Armed with a description of spikegeneration as a nonlinear operation on a low-dimensional curved manifoldin the space of inputs it is natural to ask how the details of this computa-tional picture are related to molecular mechanisms Are neurons with moredifferent types of ion channels sensitive to more stimulus dimensions or dothey implement more complex nonlinearities in a low-dimensional spaceAre adaptation and modulation mechanisms that change the nonlinearityseparable from those that change the dimensions to which the cell is sensi-tive Finally while we have shown how a low-dimensional description canbe constructed numerically from observations of the input-output proper-ties of the neuron one would like to understand analytically why such adescription emerges and whether it emerges universally from the combina-tions of channel dynamics selected by real neurons

Acknowledgments

We thank N Brenner for discussions at the start of this work and M Berryfor comments on the manuscript

References

Abbott L F amp Kepler T (1990) Model neurons From Hodgkin-Huxleyto Hopeld In L Garrido (Ed) Statistical mechanisms of neural networks(pp 5ndash18) Berlin Springer-Verlag

Aguera y Arcas B (1998)Reducing the neuron A computational approach Unpub-lished masterrsquos thesis Princeton University

Aguera y Arcas B Bialek W amp Fairhall A L (2001) What can a single neu-ron compute In T Leen T Dietterich amp V Tresp (Eds) Advances in neuralinformation processing systems 13 (pp 75ndash81) Cambridge MA MIT Press

Aguera y Arcas B amp Fairhall A (2003) What causes a neuron to spike NeuralComputation 15 1789ndash1807

Barlow H B (1953) Summation and inhibition in the frogrsquos retina J Physiol119 69ndash88

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 33: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1747

Barlow H B Hill R M amp Levick W R (1964)Retinal ganglion cells respond-ing selectively to direction and speed of image motion in the rabbit J Physiol173 377ndash407

Berry M J II amp Meister M (1999) The neural code of the retina Neuron 22435ndash450

Berry M J II Warland D amp Meister M (1997) The structure and precision ofretinal spike trains Proc Natl Acad Sci USA 94 5411ndash5416

Bialek W amp de Ruyter van Steveninck R R (2003) Features and dimensionsMotion estimation in y vision Unpublished manuscript

Boser B E Guyon I M amp Vapnik V N (1992)A training algorithm for optimalmargin classiers In D Haussler (Ed) 5th Annual ACM Workshop on COLT(pp 144ndash152) Pittsburgh PA ACM Press

Bray D (1995) Protein molecules as computational elements in living cellsNature 376 307ndash312

Brenner N Agam O Bialek W amp de Ruyter van Steveninck R R (1998)Universal statistical behavior of neural spike trains Phys Rev Lett 814000ndash4003

Brenner N Bialek W amp de Ruyter van Steveninck R R (2000) Adaptiverescaling maximizes information transmission Neuron 26 695ndash702

Brenner N Strong S Koberle R Bialek W amp de Ruyter van SteveninckR R (2000) Synergy in a neural code Neural Comp 12 1531ndash1552 Availableon-line http==xxxlanlgov=abs=physics=9902067

Cottrell G W Munro P amp Zipser D (1988) Image compression by back prop-agation A demonstration of extensional programming In N Sharkey (Ed)Models of cognition A reviewof cognitive science (Vol 2 pp 208ndash240)NorwoodNJ Ablex

Cover T M amp Thomas J A (1991) Elements of information theory New YorkWiley

de Boer E amp Kuyper P (1968) Triggered correlation IEEE Trans Biomed Eng15 169ndash179

de Ruyter van Steveninck R R amp Bialek W (1988) Real-time performance ofa movement sensitive in the blowy visual system Information transfer inshort spike sequences Proc Roy Soc Lond B 234 379ndash414

de Ruyter van Steveninck R Lewen G D Strong S P Koberle R amp BialekW (1997) Reproducibility and variability in neural spike trains Science 2751805ndash1808

Dear S P Simmons J A amp Fritz J (1993) A possible neuronal basis for repre-sentation of acoustic scenes in auditory cortex of the big brown bat Nature364 620ndash623

Fairhall A Lewen G Bialek W amp de Ruyter van Steveninck R R (2001)Efciency and ambiguity in an adaptive neural code Nature 412 787ndash792

Fitzhugh R (1961) Impulse and physiological states in models of nerve mem-brane Biophysics J 1 445ndash466

Guyon I M Boser B E amp Vapnik V N (1993) Automatic capacity tuning ofvery large VC-dimension classiers In S J Hanson J D Cowan amp C Giles(Eds) Advances in neural information processing systems 5 (pp 147ndash155) SanMateo CA Morgan Kaufmann

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 34: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

1748 B Aguera y Arcas A Fairhall and W Bialek

Hartline H K (1940)The receptive elds of optic nerve bres Amer J Physiol130 690ndash699

Hille B (1992) Ionic channels of excitable membranes Sunderland MA SinauerHodgkin A L amp Huxley A F (1952) A quantitative description of membrane

current and its application to conduction and excitation in nerve J Physiol463 391ndash407

Hubel D H amp Wiesel T N (1962) Receptive elds binocular interactionand functional architecture in the catrsquos visual cortex J Physiol (Lond) 160106ndash154

Iverson L amp Zucker S W (1995) Logical=linear operators for image curvesIEEE Trans Pattern Analysis and Machine Intelligence 17 982ndash996

Keat J Reinagel P Reid R C amp Meister M (2001) Predicting every spike Amodel for the responses of visual neurons Neuron 30(3) 803ndash817

Kepler T Abbott L F amp Marder E (1992) Reduction of conductance-basedneuron models Biological Cybernetics 66 381ndash387

Kistler W Gerstner W amp van Hemmen J L (1997)Reduction of the Hodgkin-Huxley equations to a single-variable threshold model Neural Computation9 1015ndash1045

Koch C (1999)Biophysics of computation Information processing in single neuronsNew York Oxford University Press

Kufer S W (1953) Discharge patterns and functional organization of mam-malian retina J Neurophysiol 16 37ndash68

Lewen G D Bialek W amp de Ruyter van Steveninck R R (2001)Neural codingof naturalistic motion stimuli Network 12 317ndash329

Mainen Z F amp Sejnowski T J (1995) Reliability of spike timing in neocorticalneurons Science 268 1503ndash1506

Nagumo J Arimoto S amp Yoshikawa Z (1962) An active pulse transmissionline simulating nerve axon Proc IRE 50 2061ndash2071

Oja E amp Karhunen J (1995) Signal separation by nonlinear Hebbian learningIn M Palaniswami Y Attikiouzel R J Marks II D Fogel amp T Fukuda (Eds)Computational intelligencemdasha dynamicsystemperspective(pp 83ndash97)New YorkIEEE Press

Panzeri S Petersen R Schultz S Lebedev M amp Diamond M (2001) Therole of spike timing in the coding of stimulus location in rat somatosensorycortex Neuron 29 769ndash777

Reinagel P Godwin D Sherman S M amp Koch C (1999) Encoding of visualinformation by LGN bursts J Neurophys 81 2558ndash2569

Reinagel P amp Reid R C (2000) Temporal coding of visual information in thethalamus J Neuroscience 20(14) 5392ndash5400

Rieke F Warland D Bialek W amp de Ruyter van Steveninck R R (1997)Spikes Exploring the neural code Cambridge MA MIT Press

Rosenblatt F (1958) The perceptron A probabilistic model for informationstorage and organization in the brain Psychological Review 65 386ndash408

Rosenblatt F (1962) Principles of neurodynamics New York Spartan BooksRoweis S amp Saul L (2000) Nonlinear dimensionality reduction by locally

linear embedding Science 290 2323ndash2326

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003

Page 35: Computation in a Single Neuron: Hodgkin and Huxley Revisitedwbialek/our_papers/aguerayarcas+al_03.pdf · ARTICLE Communicated by Paul Bressloff Computation in a Single Neuron: Hodgkin

Computation in a Single Neuron 1749

Schneidman E Freedman B amp Segev I (1998) Ion channel stochasticity maybe critical in determining the reliability and precision of spike timing NeuralComp 10 1679ndash1703

Shannon C E (1948) A mathematical theory of communication Bell Sys TechJournal 27 379ndash423 623ndash656

Sharpee T Rust N C amp Bialek W (in press) Maximally informative di-mensions analysing neural responses to natural signals Neural InformationProcessing Systems 2002 Available on-line http==xxxlanlgov=abs=physics=

0208057Sharpee T Rust N C amp Bialek W (2003) Maximally informative dimensions

Analysing neural responses to natural signals Unpublished manuscriptStanley G B Lei F F amp Dan Y (1999) Reconstruction of natural scenes

from ensemble responses in the lateral geniculate nucleus J Neurosci 19(18)8036ndash8042

Theunissen F Sen K amp Doupe A (2000)Spectral-temporal receptive elds ofnonlinear auditory neurons obtained using natural sounds J Neurosci 202315ndash2331

Tiesinga P H E Jose J amp Sejnowski T (2000) Comparison of current-drivenand conductance-driven neocortical model neurons with Hodgkin-Huxleyvoltage-gated channels Physical Review E 62 8413ndash8419

Tishby N Pereira F amp Bialek W (1999) The information bottleneck methodIn B Hajek amp R S Sreenivas (Eds) Proceedingsof the37th Annual AllertonCon-ference on Communication Control and Computing (pp 368ndash377) ChampaignIL Available on-line http==xxxlanlgov=abs=physics=0004057

Received January 9 2003 accepted January 28 2003