statisztikus tanulás az idegrendszerben - kfkibanmi/elte/lecture9.pdflevels, the states space can...

14
ORBÁN GERGŐ www.eng.cam.ac.uk/go223 Statisztikus tanulás az idegrendszerben

Upload: others

Post on 26-Jan-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statisztikus tanulás az idegrendszerben - KFKIbanmi/elte/lecture9.pdflevels, the states space can represent 810(10 10) 90u | different grey scale images. If we If we consider the

ORBÁN GERGŐwww.eng.cam.ac.uk/go223

Statisztikus tanulás az idegrendszerben

Page 2: Statisztikus tanulás az idegrendszerben - KFKIbanmi/elte/lecture9.pdflevels, the states space can represent 810(10 10) 90u | different grey scale images. If we If we consider the

Statisztikus tanulás az idegrendszerben 2013 tavaszhttp://eng.cam.ac.uk/~go223 2

Orbán Gergő

Bányai Mihály

Somogyvári Zoltán1. Bevezetés - G

2. Perceptron, előrecsatolt hálózatok

3. Rekurrens hálózatok, a Hopfield hálózat

4. Rejtett változós modellek

5. Reprezentációs tanulás

6. Eloszlások tanulása, a Boltzmann-gép

7. MAP paraméterbecslés, bayesi modelösszehasonlítás

8. Az EM-algoritmus, keverékmodellek

9. Az EM speciális esetei

10.PCA, ICA, divisive normalisation

11.Bayes nets, Helmholtz machine

12.DBN, kontrasztiv divergencia

13.Sampling

Page 3: Statisztikus tanulás az idegrendszerben - KFKIbanmi/elte/lecture9.pdflevels, the states space can represent 810(10 10) 90u | different grey scale images. If we If we consider the

Statisztikus tanulás az idegrendszerben 2013 tavaszhttp://eng.cam.ac.uk/~go223

Unsupervised learning

3

Input:

Gól:

(Reinforcement learning:

Input:

Gól: )

összefoglaló néven: adat - vizuális, auditoros, szöveg

Bonyolult!

Miért is?

Egyszerűsítés:

• az adatot a “z”-k terében reprezentáljuk• kategorizáció, dimenzió redukció• általánosabban a feladat: predikció, döntéshozatal, kommunikáció

Page 4: Statisztikus tanulás az idegrendszerben - KFKIbanmi/elte/lecture9.pdflevels, the states space can represent 810(10 10) 90u | different grey scale images. If we If we consider the

Statisztikus tanulás az idegrendszerben 2013 tavaszhttp://eng.cam.ac.uk/~go223

Lineáris modellek

4

PCA• A oszlopvektorai ortogonalisak• D(x) = D(z)• Izotróp zaj

2

1.1 Receptor arrays We first consider the state-space of a receptor array such as the photoreceptors in the retina. Consider an array composed of N receptors each of which can represent any value within a range of luminance (light level). Each possible image can then be represented as a single point in the N dimensional state-space with each dimension corresponding to one receptor’s luminance level. The entire state represents the set of all possible images which the array can encode. For example, for a small array consisting of receptors arranged in a 10 10u grid each able to measure 8 grey scale levels, the states space can represent (10 10) 908 10u | different grey scale images. If we consider the retina with 100 million receptors responding to a near continuous range of luminance levels (even excluding the specialisation of the receptors for colours) the possible set of images is enormous. Although the state-space of possible images is vast it turns out that typical images we see do not span the entire state-space. It is enlightening to consider how natural images are distributed within the states-space. Are they randomly located over the whole space or clumped together in a systematic way? Consider the state-space for a two-pixel image with luminance values L1 and L2 (Figure 1, left).

Figure 1. The state-space of two pixel images and some representative images in the state-space (left). With 3 greyscale levels there would be 32=9 possible states. The distribution of random images in which there is no correlation between the adjacent pixels (middle) and for a structured image in which a correlation exists between the pixels (right).

If natural images tended to occupy restricted regions of state-space (e.g. Figure 1, right with each dot representing an image) then the visual system could take advantage of this structure to increase efficiency—that is represent the structure in the image with fewer neurons. If images occupied random locations (Figure 1, middle) then there would be no statistical structure for the visual system to exploit. If we take random adjacent pixels from natural images we would get a plot similar to Figure 1, right. What does that tell us about natural images?

L1

L2

L1

L2

L1

L2

State space of two pixel images Random images Structured images

Page 5: Statisztikus tanulás az idegrendszerben - KFKIbanmi/elte/lecture9.pdflevels, the states space can represent 810(10 10) 90u | different grey scale images. If we If we consider the

Statisztikus tanulás az idegrendszerben 2013 tavaszhttp://eng.cam.ac.uk/~go223

PCA tulajdonságok

• Kompakt kódot eredményez

• Egy adatponért leírásáért általában a teljes hálózat felel

5

Page 6: Statisztikus tanulás az idegrendszerben - KFKIbanmi/elte/lecture9.pdflevels, the states space can represent 810(10 10) 90u | different grey scale images. If we If we consider the

Statisztikus tanulás az idegrendszerben 2013 tavaszhttp://eng.cam.ac.uk/~go223

Sparse kódolás, ICA

Komputációs kritériumok:

• Hiteles rekonstrukcióköltség egy adatpontra (képre):

• Kis “energiafelhasználás (kevés szimultán aktiv neuron)további költség a kód “ritkasága”:

S a Gauss-nál nagyobb kurtózissal bíró eloszlás

• teljes költség (~energia):

6

• “z”-k függetlenek• y priorja “ritka”( P(z) )

Page 7: Statisztikus tanulás az idegrendszerben - KFKIbanmi/elte/lecture9.pdflevels, the states space can represent 810(10 10) 90u | different grey scale images. If we If we consider the

Statisztikus tanulás az idegrendszerben 2013 tavaszhttp://eng.cam.ac.uk/~go223

Sparse kód tanulása: E-M

7

Algoritmus:• Itáráció EM lépésekkel• Random kezdeti feltételek• Adott konnektivitási mátrixnál az aktiviások segítségével a költség

minimalizálása• Adott aktivitásokkal a költség minimalizálása a súlyok adaptálásával

Adott konnektivitási aktivációk esetén a legjobb súlyok megtalalása:

Adott konnektivitási mátrix esetén a legjobb aktivitások megtalalása:

Page 8: Statisztikus tanulás az idegrendszerben - KFKIbanmi/elte/lecture9.pdflevels, the states space can represent 810(10 10) 90u | different grey scale images. If we If we consider the

Statisztikus tanulás az idegrendszerben 2013 tavaszhttp://eng.cam.ac.uk/~go223

Sparse kódolás: eredmény

8

A kialakult bázis:• irányított• térbeli sávszűrést valósít meg• lokalizált

Olshausen & Field ‘96

tréningezés természetes képekkel

Page 9: Statisztikus tanulás az idegrendszerben - KFKIbanmi/elte/lecture9.pdflevels, the states space can represent 810(10 10) 90u | different grey scale images. If we If we consider the

Statisztikus tanulás az idegrendszerben 2013 tavaszhttp://eng.cam.ac.uk/~go223

Tanulás és stimulus statisztika

9

Page 10: Statisztikus tanulás az idegrendszerben - KFKIbanmi/elte/lecture9.pdflevels, the states space can represent 810(10 10) 90u | different grey scale images. If we If we consider the

Statisztikus tanulás az idegrendszerben 2013 tavaszhttp://eng.cam.ac.uk/~go223

Generatív/rekogniciós modell

10

szituáció / környezet

objektumok

objektum elhelyezkedése | méret, hely, helyzet, világítás

objektum tulajdonságai |élek, felületi mintázatok

stimulus

generatív modell

inferencia/felismerés

Page 11: Statisztikus tanulás az idegrendszerben - KFKIbanmi/elte/lecture9.pdflevels, the states space can represent 810(10 10) 90u | different grey scale images. If we If we consider the

Statisztikus tanulás az idegrendszerben 2013 tavaszhttp://eng.cam.ac.uk/~go223

Generatív/rekogniciós modell

11

szituáció / környezet

objektumok

objektum elhelyezkedése | méret, hely, helyzet, világítás

objektum tulajdonságai |élek, felületi mintázatok

stimulus

Modell definició -> rekogníció:

Inferencia igénye -> rekogníció:

Page 12: Statisztikus tanulás az idegrendszerben - KFKIbanmi/elte/lecture9.pdflevels, the states space can represent 810(10 10) 90u | different grey scale images. If we If we consider the

Statisztikus tanulás az idegrendszerben 2013 tavaszhttp://eng.cam.ac.uk/~go223

Independens komponensek

12allows a system with limited responserange to handle a wider dynamic range ofinput. Divisive normalization achievesthis goal, producing sigmoidal con-trast–response functions similar to thoseseen in neurons. In addition, it seemsadvantageous for tuning curves in stim-ulus parameters such as orientation toretain their shape at different contrasts,even in the presence of response satura-tion20. Previous models have accom-plished this by computing a normalizationsignal that is independent of parameterssuch as orientation (achieved with a uni-formly weighted sum over the entire neur-al population). A consequence of thisdesign is that the models can account forthe response suppression that occurs, for example, when a grat-ing of non-optimal orientation is superimposed on a stimulus.

Model simulations versus physiologyWe compared our model with electrophysiological measurementsfrom single neurons. To simulate an experiment, we chose a pri-mary filter and a set of neighboring filters that would interactwith this primary filter. We pre-computed the optimal normal-ization weights for an ensemble of natural signals (see Methods).We then simulated each experiment, holding all parameters ofthe model fixed, by computing the normalized responses of theprimary filter to the experimental stimuli. We compared theseresponses to the physiologically measured average firing rates ofneurons. Our extended normalization model, with all parame-ters chosen to optimize statistical independence of responses,accounted for those nonlinear behaviors in V1 neurons previ-ously modeled with divisive normalization (see above). Figure 5shows data and model simulations demonstrating preservationof orientation tuning curves and cross-orientation inhibition.

Our model also accounted for nonlinear behaviors not pre-viously modeled using normalization. Figure 6a shows data froman experiment in which an optimal sinusoidal grating stimuluswas placed inside the classical receptive field of a neuron in pri-mary visual cortex of a macaque monkey24. A mask grating wasplaced in an annular region surrounding the classical receptivefield. Each curve in the figure indicates the response as a func-

Fig. 3. Examples of variance dependency innatural signals. (a) Responses of two filters toseveral different signals. Dependency is strongfor natural signals, but is negligible for whitenoise. Filters as in Fig. 1. (b) Responses of dif-ferent pairs of filters to a fixed natural signal.The strength of the variance dependencydepends on the filter pair. For the image, thered × represents a fixed spatial location onthe retina. The ordinate response is alwayscomputed with a vertical filter, and theabscissa response is computed with a verticalfilter (shifted 4 pixels), vertical filter (shifted12 pixels) and horizontal filter (shifted 12 pix-els). For the sound, the red × represents afixed time. Temporal frequency of ordinate fil-ter is 2000 Hz. Temporal frequencies ofabscissa filter are 2000 Hz (shifted 9 ms intime), 2840 Hz (shifted 9 ms) and 4019 Hz(shifted 9 ms).

tion of the center contrast for a particular surround contrast. Thesigmoidal shape of the curves results from the squaring nonlin-earity and the normalization. Presentation of the mask gratingalone does not elicit a response from the neuron, but its presencesuppresses the responses to the center grating. Specifically, thecontrast response curves are shifted to the right (on a log axis),indicative of a divisive gain change. When the mask orientation isparallel to the center, this shift is much larger than when the maskorientation is orthogonal to the center (Fig. 6b).

Our model exhibits similar behavior (Fig. 6a and b), whichis due to suppressive weighting of neighboring model neuronswith the same orientation preference that is stronger than thatof neurons with perpendicular orientation preference (see alsoref. 25). This weighting is determined by the statistics of ourimage ensemble, and is due to the increased likelihood that adja-cent regions in natural images have similar rather than orthogo-nal orientations. For example, oriented structures in images (suchas edges of objects) tend to extend along smooth contours, yield-ing strong responses in linear filters that are separated from eachother spatially, but lying along the same contour (see also refs.26, 27). This behavior would not be observed in previous nor-malization models, because the parallel and orthogonal surroundstimuli would produce the same normalization signal.

An analogous effect is seen in the auditory system. Figure 6shows example data recorded from a cat auditory nerve fiber, inwhich an optimal sinusoidal tone stimulus is combined with a

articles

nature neuroscience • volume 4 no 8 • august 2001 821

x

x x

x x

x

x

x

x x

xx

Cat White noise

Baboon Flowers White noise

Speech

a

b

©20

01 N

atur

e Pu

blis

hing

Gro

up h

ttp://

neur

osci

.nat

ure.

com

© 2001 Nature Publishing Group http://neurosci.nature.com

Schwartz & Simoncelli, 2001

Page 13: Statisztikus tanulás az idegrendszerben - KFKIbanmi/elte/lecture9.pdflevels, the states space can represent 810(10 10) 90u | different grey scale images. If we If we consider the

Statisztikus tanulás az idegrendszerben 2013 tavaszhttp://eng.cam.ac.uk/~go223

Gaussian Scale Mixtures

13

054055056057058059060061062063064065066067068069070071072073074075076077078079080081082083084085086087088089090091092093094095096097098099100101102103104105106107

large number of variables, such as those describing the position, pose, colour, and other attributes ofmultiple objects constituting a visual scene15,16.

Indeed, a powerful class of models have been developed that relates the activity of visual corticalneurons to probabilistic inference under a statistical model of natural images containing a high num-ber of latent variables17–20. Ironically, though, these models have almost exclusively concentratedon maximum a posteriori inference (but see Refs. 21,22) which by definition does not allow forrepresenting uncertainty in one’s inferences. As a result, while these models have successfully ac-counted for a number of receptive field and tuning curve properties of visual cortical cells, they didnot capture any aspects of neural variability.

We propose that neural activities represent samples from the (posterior) distribution that results fromBayesian inference. That is, at any moment in time, the vector of activity patterns in a populationof neurons represents a sample from a multivariate distribution over the high-dimensional spacespanned by multiple latent variables. The idea that the brain uses samples to represent posteriordistributions have been put forward to interpret a diverse set of psychological data23–27, but itsramifications for neural data have only been minimally explored so far16,22.

We spell out the sampling hypothesis in the context of a well-known class of natural image models,Gaussian scale mixtures (GSM)28, that has proven to be efficient in computer vision applications29

and has also been successfully used to account for sensory gain control properties of neurons in theprimary visual cortex (V1)19 as well as for a number of perceptual effects in low-level vision30. Insection 2 we define the GSM, derive equations for Bayesian inference under it and for learning its pa-rameters through Expectation Maximisation. In section 3 we describe in detail the mapping betweenthe variables of the GSM and neural activities in V1. In section 4 we show that Bayesian inferenceunder the GSM reproduces a number of recent experimental results about the detailed patterns of(co)variability and spontaneous activity of V1 simple cells under our sampling-based interpretation.Finally, in section 5 we discuss our findings, in particular in the light of other recent proposals re-lating neural variability to probabilistic inference22,31, and make experimental predictions unique toour approach.

2 Bayesian inference and maximum likelihood learning in the GSM model

Generative model. In a Gaussian Scale Mixtures (GSM) model (Fig. 1), N (whitened) imagepixels, x 2 RN , are assumed to be the linear combination of M latent variables, y 2 RM , withadditive (spherical white) Gaussian noise:

P(x|y) = N�x;Ay, �

2

x

I

�(1)

where A is the mixing matrix (column i containing the ‘projective field’ of yi), �

2

x

is the varianceof the observation noise, and I is the N ⇥ N identity matrix. For simplicity, we considered theundercomplete case, with x being an 8⇥ 8 grayscale image patch (N = 64) and M = 32.

Latent variables y are modelled as the (deterministic) product of au

z

y

x

Figure 1: Graphical model ofthe GSM used in this paper.

zero-mean multivariate Gaussian random variable, u 2 RM , and anon-negative scalar z for which we chose a Gamma prior (althoughthe exact shape of this prior does not substantially influence ourresults)

y = z u (2)P(u) = N (u;0,C) (3)P(z) = Gamma(z; k, ✓) (4)

where C is the M ⇥ M covariance matrix of the Gaussian ran-dom variables u, and k = 2 and ✓ = 2 are the shape and scaleparameters of the Gamma prior over z, respectively.

Bayesian inference. When the model is presented an image x, its task is to infer the values ofthe latent variables u and z that may have produced it (note that once these are known, y is alsotrivially known through Eq. 2). Due to observation noise (Eq. 1) and ambiguity (Eq. 2) these values

2

054055056057058059060061062063064065066067068069070071072073074075076077078079080081082083084085086087088089090091092093094095096097098099100101102103104105106107

large number of variables, such as those describing the position, pose, colour, and other attributes ofmultiple objects constituting a visual scene15,16.

Indeed, a powerful class of models have been developed that relates the activity of visual corticalneurons to probabilistic inference under a statistical model of natural images containing a high num-ber of latent variables17–20. Ironically, though, these models have almost exclusively concentratedon maximum a posteriori inference (but see Refs. 21,22) which by definition does not allow forrepresenting uncertainty in one’s inferences. As a result, while these models have successfully ac-counted for a number of receptive field and tuning curve properties of visual cortical cells, they didnot capture any aspects of neural variability.

We propose that neural activities represent samples from the (posterior) distribution that results fromBayesian inference. That is, at any moment in time, the vector of activity patterns in a populationof neurons represents a sample from a multivariate distribution over the high-dimensional spacespanned by multiple latent variables. The idea that the brain uses samples to represent posteriordistributions have been put forward to interpret a diverse set of psychological data23–27, but itsramifications for neural data have only been minimally explored so far16,22.

We spell out the sampling hypothesis in the context of a well-known class of natural image models,Gaussian scale mixtures (GSM)28, that has proven to be efficient in computer vision applications29

and has also been successfully used to account for sensory gain control properties of neurons in theprimary visual cortex (V1)19 as well as for a number of perceptual effects in low-level vision30. Insection 2 we define the GSM, derive equations for Bayesian inference under it and for learning its pa-rameters through Expectation Maximisation. In section 3 we describe in detail the mapping betweenthe variables of the GSM and neural activities in V1. In section 4 we show that Bayesian inferenceunder the GSM reproduces a number of recent experimental results about the detailed patterns of(co)variability and spontaneous activity of V1 simple cells under our sampling-based interpretation.Finally, in section 5 we discuss our findings, in particular in the light of other recent proposals re-lating neural variability to probabilistic inference22,31, and make experimental predictions unique toour approach.

2 Bayesian inference and maximum likelihood learning in the GSM model

Generative model. In a Gaussian Scale Mixtures (GSM) model (Fig. 1), N (whitened) imagepixels, x 2 RN , are assumed to be the linear combination of M latent variables, y 2 RM , withadditive (spherical white) Gaussian noise:

P(x|y) = N�x;Ay, �

2

x

I

�(1)

where A is the mixing matrix (column i containing the ‘projective field’ of yi), �

2

x

is the varianceof the observation noise, and I is the N ⇥ N identity matrix. For simplicity, we considered theundercomplete case, with x being an 8⇥ 8 grayscale image patch (N = 64) and M = 32.

Latent variables y are modelled as the (deterministic) product of au

z

y

x

Figure 1: Graphical model ofthe GSM used in this paper.

zero-mean multivariate Gaussian random variable, u 2 RM , and anon-negative scalar z for which we chose a Gamma prior (althoughthe exact shape of this prior does not substantially influence ourresults)

y = z u (2)P(u) = N (u;0,C) (3)P(z) = Gamma(z; k, ✓) (4)

where C is the M ⇥ M covariance matrix of the Gaussian ran-dom variables u, and k = 2 and ✓ = 2 are the shape and scaleparameters of the Gamma prior over z, respectively.

Bayesian inference. When the model is presented an image x, its task is to infer the values ofthe latent variables u and z that may have produced it (note that once these are known, y is alsotrivially known through Eq. 2). Due to observation noise (Eq. 1) and ambiguity (Eq. 2) these values

2

054055056057058059060061062063064065066067068069070071072073074075076077078079080081082083084085086087088089090091092093094095096097098099100101102103104105106107

large number of variables, such as those describing the position, pose, colour, and other attributes ofmultiple objects constituting a visual scene15,16.

Indeed, a powerful class of models have been developed that relates the activity of visual corticalneurons to probabilistic inference under a statistical model of natural images containing a high num-ber of latent variables17–20. Ironically, though, these models have almost exclusively concentratedon maximum a posteriori inference (but see Refs. 21,22) which by definition does not allow forrepresenting uncertainty in one’s inferences. As a result, while these models have successfully ac-counted for a number of receptive field and tuning curve properties of visual cortical cells, they didnot capture any aspects of neural variability.

We propose that neural activities represent samples from the (posterior) distribution that results fromBayesian inference. That is, at any moment in time, the vector of activity patterns in a populationof neurons represents a sample from a multivariate distribution over the high-dimensional spacespanned by multiple latent variables. The idea that the brain uses samples to represent posteriordistributions have been put forward to interpret a diverse set of psychological data23–27, but itsramifications for neural data have only been minimally explored so far16,22.

We spell out the sampling hypothesis in the context of a well-known class of natural image models,Gaussian scale mixtures (GSM)28, that has proven to be efficient in computer vision applications29

and has also been successfully used to account for sensory gain control properties of neurons in theprimary visual cortex (V1)19 as well as for a number of perceptual effects in low-level vision30. Insection 2 we define the GSM, derive equations for Bayesian inference under it and for learning its pa-rameters through Expectation Maximisation. In section 3 we describe in detail the mapping betweenthe variables of the GSM and neural activities in V1. In section 4 we show that Bayesian inferenceunder the GSM reproduces a number of recent experimental results about the detailed patterns of(co)variability and spontaneous activity of V1 simple cells under our sampling-based interpretation.Finally, in section 5 we discuss our findings, in particular in the light of other recent proposals re-lating neural variability to probabilistic inference22,31, and make experimental predictions unique toour approach.

2 Bayesian inference and maximum likelihood learning in the GSM model

Generative model. In a Gaussian Scale Mixtures (GSM) model (Fig. 1), N (whitened) imagepixels, x 2 RN , are assumed to be the linear combination of M latent variables, y 2 RM , withadditive (spherical white) Gaussian noise:

P(x|y) = N�x;Ay, �

2

x

I

�(1)

where A is the mixing matrix (column i containing the ‘projective field’ of yi), �

2

x

is the varianceof the observation noise, and I is the N ⇥ N identity matrix. For simplicity, we considered theundercomplete case, with x being an 8⇥ 8 grayscale image patch (N = 64) and M = 32.

Latent variables y are modelled as the (deterministic) product of au

z

y

x

Figure 1: Graphical model ofthe GSM used in this paper.

zero-mean multivariate Gaussian random variable, u 2 RM , and anon-negative scalar z for which we chose a Gamma prior (althoughthe exact shape of this prior does not substantially influence ourresults)

y = z u (2)P(u) = N (u;0,C) (3)P(z) = Gamma(z; k, ✓) (4)

where C is the M ⇥ M covariance matrix of the Gaussian ran-dom variables u, and k = 2 and ✓ = 2 are the shape and scaleparameters of the Gamma prior over z, respectively.

Bayesian inference. When the model is presented an image x, its task is to infer the values ofthe latent variables u and z that may have produced it (note that once these are known, y is alsotrivially known through Eq. 2). Due to observation noise (Eq. 1) and ambiguity (Eq. 2) these values

2

image

1

First-order statistics (pixel histograms)

linear features2 N

a1 feature1 + a2 feature2 + . . .+ aN featureN + noise

image = contrast⇥⇣ ⌘

824 nature neuroscience • volume 4 no 8 • august 2001

contrast has been suggested as a means of maximizing marginalentropy, thus providing a functional explanation for gain con-trol in the retina35. Our work differs conceptually in the choiceof statistical criteria (independence between filters, as opposedto marginal statistics of one filter). In audition, outer hair cellshave been implicated in providing gain control8,36, and some ofthe behaviors we describe at the level of the auditory nerve havealso been documented in recordings from basilar membrane.

Our model is based on a mechanism that is fundamentallysuppressive, but a number of authors have reported facilitativeinfluences in both vision and audition14,37–39. Some of thesefacilitative effects might be explained by the use of maskingstimuli that inadvertently excite the receptive field of the neu-ron13,40, thus causing suppression to overcome facilitation onlyat high contrasts or sound pressure levels of the mask. Facilita-tive effects might also be explained by dis-inhibition, in whicha third cell inhibits a second cell, thus releasing its inhibitionof the recorded cell. As mentioned above, our current modeldoes not use a recurrent implementation and thus cannot pre-dict such effects.

The relationship between the model and perception shouldalso be explored. For example, psychophysical experiments sug-gest that visual detectability is enhanced along contours41. At firstglance, this might seem to be inconsistent with our model, inwhich neurons that lie along contours will suppress each other.But the apparent contradiction is based on the unsubstantiatedintuition that a reduction in the neural responses implies reduceddetectability. Presumably, any difference in relative activity ofneurons along the contour, as compared with the activity of neu-rons in other regions, could be used for contour detection. Moregenerally, examination of the implications of our model for per-ception requires a method of extracting a percept from a popu-lation of neural responses. Although this has not been done forcontour detection, we find it encouraging that other basic per-cepts have been explained in the context of a population of neu-rons performing gain control (for example, detectability of agrating in the presence of a mask42 and perceptual segregationof visual textures43).

There are many directions for further refinement of the con-nection between natural signal statistics and neuronal process-ing. We have optimized our model for a generic signal ensemble,and neurons may be specialized for particular subclasses of sig-nals44. Moreover, mechanisms and associated timescales (that is,evolution, development, learning and adaptation) by which theoptimization occurs could be modeled. For example, some visu-al adaptation effects have been explained by adjusting modelparameters according to the statistical properties of recent visu-al input45,46. A more complete theory also requires an under-standing of which groups of neurons are optimized forindependence. A sensible assumption might be that each stageof processing in the system takes the responses of the previousstage and attempts to eliminate as much statistical redundancyas possible, within the limits of its computational capabilities. Itremains to be seen how much of sensory processing can beexplained using such a bottom-up criterion.

Future work should also be directed toward testing the effi-cient coding hypothesis experimentally. Some support for thehypothesis has been obtained through recordings from groupsof neurons47,48 under naturalistic stimulation conditions. Webelieve that improvements in both experimental techniques andstatistical models of natural signals will continue to provide newopportunities to test and extend the efficient coding hypothesisproposed by Barlow forty years ago.

METHODSFor the auditory simulations, we used a set of Gammatone filters as thelinear front end49. We chose a primary filter with center frequency of2000 Hz, and a neighborhood of filters for the normalization signal: 16filters with center frequencies 205 to 4768 Hz, and replicas of all filterstemporally shifted by 100, 200 and 300 samples. For the visual simula-tions, linear receptive fields were derived using a multi-scale orienteddecomposition known as the steerable pyramid50. The primary filterwas vertically oriented with peak spatial frequency of 1/8 cycles/pixel.The filter neighborhood included all combinations of two spatial fre-quencies, four orientations, two phases and a spatial extent three timesthe diameter of the primary filter. Responses were horizontally and ver-tically subsampled at four-pixel intervals. To reduce the dimensionalityof the weight vector that needs to be optimized, we assumed that weightsfor two filters with differing phase were the same, thus guaranteeing aphase-invariant normalization signal. We also assumed vertical and hor-izontal symmetry. We verified that these simplifications did not sub-stantially alter the simulation results.

Our ensemble of natural sounds consisted of nine animal and speechsounds, each approximately six seconds long. The sounds wereobtained from commercial compact disks and converted to samplingfrequency of 22050 Hz. The natural image ensemble consisted of 10images obtained from a database of standard images used in imagecompression benchmarks (known as boats, goldhill, Einstein, Feyn-man, baboon, etc.). We obtained similar results using an intensity cal-ibrated image set6.

For a pair of filters, we modeled the variance of response of the firstfilter given the response of the second filter to a visual/auditory stimu-lus as follows.

(1)

Here, L1 and L2 are the linear responses of the two filters. This condi-tional variance dependency is eliminated by dividing the following.

(2)

We assumed a generalization of this dependency to a population of fil-ters. We modeled the variance dependency of the response of filter Ligiven the responses of a population of filters Lj in a neighborhood Ni.

(3)

Again, the conditional variance dependency is eliminated by dividing thefollowing.

(4)

We wanted to choose the parameters of the model (the weights wji, andthe constant σ) to maximize the independence of the normalizedresponse to an ensemble of natural images and sounds. Such an opti-mization was computationally prohibitive. To reduce the complexity ofthe problem, we assume a Gaussian form for the underlying condition-al distribution.

(5)22(Σ )

i

j

j j

w L + σ i jj

–L i2

P (L {L , j ∈N }) =

√ 2π 22(Σ )j w L + σi jj2

1 exp[ ]

i22Σ w L + σij jj

R =L i

2

22var (L {L , j ∈N }) = Σ w L + σi i ij

jjj

R = L

112

2 22wL + σ

var (L L )= wL + σ 1 2 22 2

articles

©20

01 N

atur

e Pu

blis

hing

Gro

up h

ttp://

neur

osci

.nat

ure.

com

© 2001 Nature Publishing Group http://neurosci.nature.com

824 nature neuroscience • volume 4 no 8 • august 2001

contrast has been suggested as a means of maximizing marginalentropy, thus providing a functional explanation for gain con-trol in the retina35. Our work differs conceptually in the choiceof statistical criteria (independence between filters, as opposedto marginal statistics of one filter). In audition, outer hair cellshave been implicated in providing gain control8,36, and some ofthe behaviors we describe at the level of the auditory nerve havealso been documented in recordings from basilar membrane.

Our model is based on a mechanism that is fundamentallysuppressive, but a number of authors have reported facilitativeinfluences in both vision and audition14,37–39. Some of thesefacilitative effects might be explained by the use of maskingstimuli that inadvertently excite the receptive field of the neu-ron13,40, thus causing suppression to overcome facilitation onlyat high contrasts or sound pressure levels of the mask. Facilita-tive effects might also be explained by dis-inhibition, in whicha third cell inhibits a second cell, thus releasing its inhibitionof the recorded cell. As mentioned above, our current modeldoes not use a recurrent implementation and thus cannot pre-dict such effects.

The relationship between the model and perception shouldalso be explored. For example, psychophysical experiments sug-gest that visual detectability is enhanced along contours41. At firstglance, this might seem to be inconsistent with our model, inwhich neurons that lie along contours will suppress each other.But the apparent contradiction is based on the unsubstantiatedintuition that a reduction in the neural responses implies reduceddetectability. Presumably, any difference in relative activity ofneurons along the contour, as compared with the activity of neu-rons in other regions, could be used for contour detection. Moregenerally, examination of the implications of our model for per-ception requires a method of extracting a percept from a popu-lation of neural responses. Although this has not been done forcontour detection, we find it encouraging that other basic per-cepts have been explained in the context of a population of neu-rons performing gain control (for example, detectability of agrating in the presence of a mask42 and perceptual segregationof visual textures43).

There are many directions for further refinement of the con-nection between natural signal statistics and neuronal process-ing. We have optimized our model for a generic signal ensemble,and neurons may be specialized for particular subclasses of sig-nals44. Moreover, mechanisms and associated timescales (that is,evolution, development, learning and adaptation) by which theoptimization occurs could be modeled. For example, some visu-al adaptation effects have been explained by adjusting modelparameters according to the statistical properties of recent visu-al input45,46. A more complete theory also requires an under-standing of which groups of neurons are optimized forindependence. A sensible assumption might be that each stageof processing in the system takes the responses of the previousstage and attempts to eliminate as much statistical redundancyas possible, within the limits of its computational capabilities. Itremains to be seen how much of sensory processing can beexplained using such a bottom-up criterion.

Future work should also be directed toward testing the effi-cient coding hypothesis experimentally. Some support for thehypothesis has been obtained through recordings from groupsof neurons47,48 under naturalistic stimulation conditions. Webelieve that improvements in both experimental techniques andstatistical models of natural signals will continue to provide newopportunities to test and extend the efficient coding hypothesisproposed by Barlow forty years ago.

METHODSFor the auditory simulations, we used a set of Gammatone filters as thelinear front end49. We chose a primary filter with center frequency of2000 Hz, and a neighborhood of filters for the normalization signal: 16filters with center frequencies 205 to 4768 Hz, and replicas of all filterstemporally shifted by 100, 200 and 300 samples. For the visual simula-tions, linear receptive fields were derived using a multi-scale orienteddecomposition known as the steerable pyramid50. The primary filterwas vertically oriented with peak spatial frequency of 1/8 cycles/pixel.The filter neighborhood included all combinations of two spatial fre-quencies, four orientations, two phases and a spatial extent three timesthe diameter of the primary filter. Responses were horizontally and ver-tically subsampled at four-pixel intervals. To reduce the dimensionalityof the weight vector that needs to be optimized, we assumed that weightsfor two filters with differing phase were the same, thus guaranteeing aphase-invariant normalization signal. We also assumed vertical and hor-izontal symmetry. We verified that these simplifications did not sub-stantially alter the simulation results.

Our ensemble of natural sounds consisted of nine animal and speechsounds, each approximately six seconds long. The sounds wereobtained from commercial compact disks and converted to samplingfrequency of 22050 Hz. The natural image ensemble consisted of 10images obtained from a database of standard images used in imagecompression benchmarks (known as boats, goldhill, Einstein, Feyn-man, baboon, etc.). We obtained similar results using an intensity cal-ibrated image set6.

For a pair of filters, we modeled the variance of response of the firstfilter given the response of the second filter to a visual/auditory stimu-lus as follows.

(1)

Here, L1 and L2 are the linear responses of the two filters. This condi-tional variance dependency is eliminated by dividing the following.

(2)

We assumed a generalization of this dependency to a population of fil-ters. We modeled the variance dependency of the response of filter Ligiven the responses of a population of filters Lj in a neighborhood Ni.

(3)

Again, the conditional variance dependency is eliminated by dividing thefollowing.

(4)

We wanted to choose the parameters of the model (the weights wji, andthe constant σ) to maximize the independence of the normalizedresponse to an ensemble of natural images and sounds. Such an opti-mization was computationally prohibitive. To reduce the complexity ofthe problem, we assume a Gaussian form for the underlying condition-al distribution.

(5)22(Σ )

i

j

j j

w L + σ i jj

–L i2

P (L {L , j ∈N }) =

√ 2π 22(Σ )j w L + σi jj2

1 exp[ ]

i22Σ w L + σij jj

R =L i

2

22var (L {L , j ∈N }) = Σ w L + σi i ij

jjj

R = L

112

2 22wL + σ

var (L L )= wL + σ 1 2 22 2

articles

©20

01 N

atur

e Pu

blis

hing

Gro

up h

ttp://

neur

osci

.nat

ure.

com

© 2001 Nature Publishing Group http://neurosci.nature.com

824 nature neuroscience • volume 4 no 8 • august 2001

contrast has been suggested as a means of maximizing marginalentropy, thus providing a functional explanation for gain con-trol in the retina35. Our work differs conceptually in the choiceof statistical criteria (independence between filters, as opposedto marginal statistics of one filter). In audition, outer hair cellshave been implicated in providing gain control8,36, and some ofthe behaviors we describe at the level of the auditory nerve havealso been documented in recordings from basilar membrane.

Our model is based on a mechanism that is fundamentallysuppressive, but a number of authors have reported facilitativeinfluences in both vision and audition14,37–39. Some of thesefacilitative effects might be explained by the use of maskingstimuli that inadvertently excite the receptive field of the neu-ron13,40, thus causing suppression to overcome facilitation onlyat high contrasts or sound pressure levels of the mask. Facilita-tive effects might also be explained by dis-inhibition, in whicha third cell inhibits a second cell, thus releasing its inhibitionof the recorded cell. As mentioned above, our current modeldoes not use a recurrent implementation and thus cannot pre-dict such effects.

The relationship between the model and perception shouldalso be explored. For example, psychophysical experiments sug-gest that visual detectability is enhanced along contours41. At firstglance, this might seem to be inconsistent with our model, inwhich neurons that lie along contours will suppress each other.But the apparent contradiction is based on the unsubstantiatedintuition that a reduction in the neural responses implies reduceddetectability. Presumably, any difference in relative activity ofneurons along the contour, as compared with the activity of neu-rons in other regions, could be used for contour detection. Moregenerally, examination of the implications of our model for per-ception requires a method of extracting a percept from a popu-lation of neural responses. Although this has not been done forcontour detection, we find it encouraging that other basic per-cepts have been explained in the context of a population of neu-rons performing gain control (for example, detectability of agrating in the presence of a mask42 and perceptual segregationof visual textures43).

There are many directions for further refinement of the con-nection between natural signal statistics and neuronal process-ing. We have optimized our model for a generic signal ensemble,and neurons may be specialized for particular subclasses of sig-nals44. Moreover, mechanisms and associated timescales (that is,evolution, development, learning and adaptation) by which theoptimization occurs could be modeled. For example, some visu-al adaptation effects have been explained by adjusting modelparameters according to the statistical properties of recent visu-al input45,46. A more complete theory also requires an under-standing of which groups of neurons are optimized forindependence. A sensible assumption might be that each stageof processing in the system takes the responses of the previousstage and attempts to eliminate as much statistical redundancyas possible, within the limits of its computational capabilities. Itremains to be seen how much of sensory processing can beexplained using such a bottom-up criterion.

Future work should also be directed toward testing the effi-cient coding hypothesis experimentally. Some support for thehypothesis has been obtained through recordings from groupsof neurons47,48 under naturalistic stimulation conditions. Webelieve that improvements in both experimental techniques andstatistical models of natural signals will continue to provide newopportunities to test and extend the efficient coding hypothesisproposed by Barlow forty years ago.

METHODSFor the auditory simulations, we used a set of Gammatone filters as thelinear front end49. We chose a primary filter with center frequency of2000 Hz, and a neighborhood of filters for the normalization signal: 16filters with center frequencies 205 to 4768 Hz, and replicas of all filterstemporally shifted by 100, 200 and 300 samples. For the visual simula-tions, linear receptive fields were derived using a multi-scale orienteddecomposition known as the steerable pyramid50. The primary filterwas vertically oriented with peak spatial frequency of 1/8 cycles/pixel.The filter neighborhood included all combinations of two spatial fre-quencies, four orientations, two phases and a spatial extent three timesthe diameter of the primary filter. Responses were horizontally and ver-tically subsampled at four-pixel intervals. To reduce the dimensionalityof the weight vector that needs to be optimized, we assumed that weightsfor two filters with differing phase were the same, thus guaranteeing aphase-invariant normalization signal. We also assumed vertical and hor-izontal symmetry. We verified that these simplifications did not sub-stantially alter the simulation results.

Our ensemble of natural sounds consisted of nine animal and speechsounds, each approximately six seconds long. The sounds wereobtained from commercial compact disks and converted to samplingfrequency of 22050 Hz. The natural image ensemble consisted of 10images obtained from a database of standard images used in imagecompression benchmarks (known as boats, goldhill, Einstein, Feyn-man, baboon, etc.). We obtained similar results using an intensity cal-ibrated image set6.

For a pair of filters, we modeled the variance of response of the firstfilter given the response of the second filter to a visual/auditory stimu-lus as follows.

(1)

Here, L1 and L2 are the linear responses of the two filters. This condi-tional variance dependency is eliminated by dividing the following.

(2)

We assumed a generalization of this dependency to a population of fil-ters. We modeled the variance dependency of the response of filter Ligiven the responses of a population of filters Lj in a neighborhood Ni.

(3)

Again, the conditional variance dependency is eliminated by dividing thefollowing.

(4)

We wanted to choose the parameters of the model (the weights wji, andthe constant σ) to maximize the independence of the normalizedresponse to an ensemble of natural images and sounds. Such an opti-mization was computationally prohibitive. To reduce the complexity ofthe problem, we assume a Gaussian form for the underlying condition-al distribution.

(5)22(Σ )

i

j

j j

w L + σ i jj

–L i2

P (L {L , j ∈N }) =

√ 2π 22(Σ )j w L + σi jj2

1 exp[ ]

i22Σ w L + σij jj

R =L i

2

22var (L {L , j ∈N }) = Σ w L + σi i ij

jjj

R = L

112

2 22wL + σ

var (L L )= wL + σ 1 2 22 2

articles

©20

01 N

atur

e Pu

blis

hing

Gro

up h

ttp://

neur

osci

.nat

ure.

com

© 2001 Nature Publishing Group http://neurosci.nature.com

824 nature neuroscience • volume 4 no 8 • august 2001

contrast has been suggested as a means of maximizing marginalentropy, thus providing a functional explanation for gain con-trol in the retina35. Our work differs conceptually in the choiceof statistical criteria (independence between filters, as opposedto marginal statistics of one filter). In audition, outer hair cellshave been implicated in providing gain control8,36, and some ofthe behaviors we describe at the level of the auditory nerve havealso been documented in recordings from basilar membrane.

Our model is based on a mechanism that is fundamentallysuppressive, but a number of authors have reported facilitativeinfluences in both vision and audition14,37–39. Some of thesefacilitative effects might be explained by the use of maskingstimuli that inadvertently excite the receptive field of the neu-ron13,40, thus causing suppression to overcome facilitation onlyat high contrasts or sound pressure levels of the mask. Facilita-tive effects might also be explained by dis-inhibition, in whicha third cell inhibits a second cell, thus releasing its inhibitionof the recorded cell. As mentioned above, our current modeldoes not use a recurrent implementation and thus cannot pre-dict such effects.

The relationship between the model and perception shouldalso be explored. For example, psychophysical experiments sug-gest that visual detectability is enhanced along contours41. At firstglance, this might seem to be inconsistent with our model, inwhich neurons that lie along contours will suppress each other.But the apparent contradiction is based on the unsubstantiatedintuition that a reduction in the neural responses implies reduceddetectability. Presumably, any difference in relative activity ofneurons along the contour, as compared with the activity of neu-rons in other regions, could be used for contour detection. Moregenerally, examination of the implications of our model for per-ception requires a method of extracting a percept from a popu-lation of neural responses. Although this has not been done forcontour detection, we find it encouraging that other basic per-cepts have been explained in the context of a population of neu-rons performing gain control (for example, detectability of agrating in the presence of a mask42 and perceptual segregationof visual textures43).

There are many directions for further refinement of the con-nection between natural signal statistics and neuronal process-ing. We have optimized our model for a generic signal ensemble,and neurons may be specialized for particular subclasses of sig-nals44. Moreover, mechanisms and associated timescales (that is,evolution, development, learning and adaptation) by which theoptimization occurs could be modeled. For example, some visu-al adaptation effects have been explained by adjusting modelparameters according to the statistical properties of recent visu-al input45,46. A more complete theory also requires an under-standing of which groups of neurons are optimized forindependence. A sensible assumption might be that each stageof processing in the system takes the responses of the previousstage and attempts to eliminate as much statistical redundancyas possible, within the limits of its computational capabilities. Itremains to be seen how much of sensory processing can beexplained using such a bottom-up criterion.

Future work should also be directed toward testing the effi-cient coding hypothesis experimentally. Some support for thehypothesis has been obtained through recordings from groupsof neurons47,48 under naturalistic stimulation conditions. Webelieve that improvements in both experimental techniques andstatistical models of natural signals will continue to provide newopportunities to test and extend the efficient coding hypothesisproposed by Barlow forty years ago.

METHODSFor the auditory simulations, we used a set of Gammatone filters as thelinear front end49. We chose a primary filter with center frequency of2000 Hz, and a neighborhood of filters for the normalization signal: 16filters with center frequencies 205 to 4768 Hz, and replicas of all filterstemporally shifted by 100, 200 and 300 samples. For the visual simula-tions, linear receptive fields were derived using a multi-scale orienteddecomposition known as the steerable pyramid50. The primary filterwas vertically oriented with peak spatial frequency of 1/8 cycles/pixel.The filter neighborhood included all combinations of two spatial fre-quencies, four orientations, two phases and a spatial extent three timesthe diameter of the primary filter. Responses were horizontally and ver-tically subsampled at four-pixel intervals. To reduce the dimensionalityof the weight vector that needs to be optimized, we assumed that weightsfor two filters with differing phase were the same, thus guaranteeing aphase-invariant normalization signal. We also assumed vertical and hor-izontal symmetry. We verified that these simplifications did not sub-stantially alter the simulation results.

Our ensemble of natural sounds consisted of nine animal and speechsounds, each approximately six seconds long. The sounds wereobtained from commercial compact disks and converted to samplingfrequency of 22050 Hz. The natural image ensemble consisted of 10images obtained from a database of standard images used in imagecompression benchmarks (known as boats, goldhill, Einstein, Feyn-man, baboon, etc.). We obtained similar results using an intensity cal-ibrated image set6.

For a pair of filters, we modeled the variance of response of the firstfilter given the response of the second filter to a visual/auditory stimu-lus as follows.

(1)

Here, L1 and L2 are the linear responses of the two filters. This condi-tional variance dependency is eliminated by dividing the following.

(2)

We assumed a generalization of this dependency to a population of fil-ters. We modeled the variance dependency of the response of filter Ligiven the responses of a population of filters Lj in a neighborhood Ni.

(3)

Again, the conditional variance dependency is eliminated by dividing thefollowing.

(4)

We wanted to choose the parameters of the model (the weights wji, andthe constant σ) to maximize the independence of the normalizedresponse to an ensemble of natural images and sounds. Such an opti-mization was computationally prohibitive. To reduce the complexity ofthe problem, we assume a Gaussian form for the underlying condition-al distribution.

(5)22(Σ )

i

j

j j

w L + σ i jj

–L i2

P (L {L , j ∈N }) =

√ 2π 22(Σ )j w L + σi jj2

1 exp[ ]

i22Σ w L + σij jj

R =L i

2

22var (L {L , j ∈N }) = Σ w L + σi i ij

jjj

R = L

112

2 22wL + σ

var (L L )= wL + σ 1 2 22 2

articles

©20

01 N

atur

e Pu

blis

hing

Gro

up h

ttp://

neur

osci

.nat

ure.

com

© 2001 Nature Publishing Group http://neurosci.nature.com

Page 14: Statisztikus tanulás az idegrendszerben - KFKIbanmi/elte/lecture9.pdflevels, the states space can represent 810(10 10) 90u | different grey scale images. If we If we consider the

Statisztikus tanulás az idegrendszerben 2013 tavaszhttp://eng.cam.ac.uk/~go223

Neurális adatok és GSM

14

822 nature neuroscience • volume 4 no 8 • august 2001

masking tone. As in the visual data, the rate–level curves of theauditory nerve fiber shift to the right (on a log scale) in the pres-ence of the masking tone (Fig. 6c and d). This shift is larger whenthe mask frequency is closer to the optimal frequency for the cell.Again, the model behavior is due to variations in suppressiveweighting across neurons tuned for adjacent frequencies, whichin turn arises from the statistical properties illustrated in Fig. 3b.

As mentioned above, a motivating characteristic of normal-ization models has been the preservation of the shape of the tun-ing curve under changes in input level. However, the shapes ofphysiologically measured tuning curves for some parametersexhibit substantial dependence on input level in both audition16

and vision17,18. Figure 7a shows an example of this behavior in aneuron from primary visual cortex of a macaque monkey24. Thegraph shows the response of the cell as a function of the radius ofa circular patch of sinusoidal grating, at two different contrast lev-els. The high-contrast responses are generally larger than the low-contrast responses, but in addition, the shape of the curve changes.Specifically, for higher contrast, the peak response occurs at asmaller radius. The same behavior is seen in our model neuron.

Analogous results were obtained for a typical cell in the audi-tory nerve fiber of a squirrel monkey16 (Fig. 7b). Responses areplotted as a function of frequency, for a number of different soundpressure levels. As the sound pressure level increases, the frequencytuning becomes broader, developing a ‘shoulder’ and a secondarymode (Fig. 7b). Both cell and model show similar behavior,despite the fact that we have not adjusted the parameters to fitthese data; all weights in the model are chosen by optimizing theindependence of the responses to the ensemble of natural sounds.The model behavior arises because the weighted normalizationsignal is dependent on frequency. At low input levels, this fre-quency dependence is inconsequential because the additive con-stant dominates the signal. But at high input levels, this frequencydependence modulates the shape of the frequency tuning curve

that is primarily established by the numerator kernel of the model.In Fig. 7b, the high contrast secondary mode corresponds to fre-quency bands with minimal normalization weighting.

DISCUSSIONWe have described a generic nonlinear model for early sensoryprocessing, in which linear responses were squared and thendivided by a gain control signal computed as a weighted sum ofthe squared linear responses of neighboring neurons and a con-stant. The form of this model was chosen to eliminate the typeof dependencies that we have observed between responses of pairsof linear receptive fields to natural signals (Fig. 2). The parame-ters of the model (in particular, the weights used to compute thegain control signal) were chosen to maximize the independenceof responses to a particular set of signals. We demonstrated thatthe resulting model accounts for a range of sensory nonlinearitiesin ‘typical’ cells. Although there are quantitative differencesamong individual cells, the qualitative behaviors we modeledhave been observed previously. Our model can account for phys-iologically observed nonlinearities in two different modalities.This suggests a canonical neural mechanism for eliminating thestatistical dependencies prevalent in typical natural signals.

The concept of gain control has been used previously to explain

articles

σ2

Other squaredfilter responses

Other squaredfilter responses

0 4

4

00 1

-1

0

1

Stimulus

-1

σ2

F2

F1

Fig. 4. Generic normalization model for vision and audition. Each filterresponse is divided by the weighted sum of squared responses of neigh-boring filters and an additive constant. Parameters are determined usingMaximum Likelihood on a generic ensemble of signals (see Methods).The conditional histogram of normalized filter responses demonstratesthat the variance of N2 is roughly constant, independent of N1. The dia-gram is a representation of the computation and is not meant to specifya particular mechanism or implementation (see Discussion).

ModelCell

Mask contrast:

(Bonds, 1989)

0

40

Signal contrast Signal contrast10.1.01 10.1.01

ModelCell(Skottun et al., 1987)

0.80.20.05

-100 1000-50 500

40

Mea

n fir

ing

rate

0

ModelCell(Bonds, 1989)

-100 0 1000

1

Grating orientation

Mea

n fir

ing

rate

Mea

n fir

ing

rate

-100 0 100Grating orientation

20

20

Contrast:

mask grating

single grating

Orientation Orientation

0.050.10.2

Fig. 5. Classical nonlinear behaviors of V1 neurons. (a) Contrast inde-pendence of orientation tuning22. (b) Orientation masking22. Dashed lineindicates response to a single grating, as a function of orientation. Solidline indicates response to an optimal grating additively superimposed ona mask grating of variable orientation. All curves are normalized to have amaximum value of one. (c) Cross-orientation suppression23. Responsesto optimal stimulus are suppressed by an orthogonal masking stimuluswithin the receptive field. This results in a rightward shift of the contrastresponse curve (on a log axis). Curves on cell data plot are fitted with aNaka–Rushton function, r(c) = c2/(ac2 + b2).

a

b

c

©20

01 N

atur

e Pu

blis

hing

Gro

up h

ttp://

neur

osci

.nat

ure.

com

© 2001 Nature Publishing Group http://neurosci.nature.com

ber of mechanisms. For example, feedforward synaptic depres-sion mechanisms have been documented and have been shown toexhibit gain control properties30. Although such mechanismsmay account for suppressive behaviors within the classical recep-tive field, they seem unlikely to account for such behaviors likethose shown in Fig. 6. It has also been proposed that normaliza-tion could result from shunting inhibition driven by other neu-rons31–33. This type of implementation necessarily involvesrecursive lateral or feedback connections and thus introducestemporal dynamics. Some researchers have described recurrentmodels that can produce steady-state responses consistent withdivisive normalization in primary visual cortex10,20.

Some of the gain control behaviors we describe may be attrib-uted to earlier stages of neural processing. Gain control occursat the level of the retina9,34, although selectivity for orientationdoes not arise before cortical area V1. In fact, division by local

Fig. 7. Nonlinear changes in tuning curves at different input levels. (a) Mean response rate of a V1 neuron as a function of stimulus radiusfor two different contrasts. The peak response radius for both cell andmodel is smaller for the higher contrast24. (b) Mean response rate of anauditory nerve fiber as a function of stimulus frequency for a range ofsound pressure levels16. Tuning curve broadens and saturates at highlevels. For all plots, maximum model response has been rescaled tomatch that of the cell.

nonlinear behaviors of neurons. For example, a number of audi-tory models have incorporated explicit gain control mecha-nisms8,28,29. Visual models based on divisive normalization havebeen developed to explain nonlinear effects in cortical area V1within the classical receptive field10,20. The standard modelassumes that the response of each neuron is divided by an equal-ly weighted sum of all other neurons and an additive constant.Our model uses a weighted sum for the normalization signal, andis thus able to account for a wider range of nonlinear behaviors. Inaddition, our model provides an ecological justification, throughthe efficient coding hypothesis2, for such gain control models.

Our model accounts for nonlinear changes in tuning curveshape at different levels of input. Such behaviors have been gen-erally interpreted to mean that the fundamental tuning proper-ties of cells depend on the strength of the input signal. But in ourmodel, the fundamental tuning properties are determined by afixed linear receptive field, and are modulated by a gain controlsignal with its own tuning properties. Although such behaviorsmay seem to be artifacts, our model suggests that they occur nat-urally in a system that is optimized for statistical independenceover natural signals.

Our current model provides a functional description, anddoes not specify the circuitry or biophysics by which these func-tions are implemented. Our normalization computation is doneinstantaneously and we have only modeled mean firing rates.Normalization behavior could potentially arise through a num-

Fig. 6. Suppression of responses to optimal stimuli by masking stimuli.(a) Vision experiment24. Mean response rate of a V1 neuron of an audi-tory nerve fiber as a function of contrast of an optimally oriented grat-ing presented in the classical receptive field, in the presence of asurrounding parallel masking stimulus. Curves on cell data plots are fitsof a Naka–Rushton equation with two free parameters24. (b) Meanresponse rate versus center contrast, in the presence of an orthogonalsurround mask. (c) Auditory experiment11. Mean response rate of anauditory nerve fiber versus sound pressure level, in the presence of anon-optimal mask at 1.25 times the optimal frequency. (d) Meanresponse rate versus sound pressure level, in the presence of a non-optimal mask at 2.08 times the optimal frequency. For all plots, maxi-mum model response has been rescaled to match that of the cell.

articles

nature neuroscience • volume 4 no 8 • august 2001 823

Cell(Cavanaugh et al., 2000)

Model

Mea

n fir

ing

rate

Mea

n fir

ing

rate

Cell(Javel et al., 1978)

Model

20 40 60Signal intensity (dB)

0

60

120

Mea

n fir

ing

rate

Signal Mask

Mask intensity:

Mask

Signal

Signal Mask

Mask

Signal

0

40

0

40

Signal contrast0.03 0.3 1

Signal contrast0.03 0.3 1

Signal contrast0.03 0.3 1

Signal contrast0.03 0.3 1

Mask contrast:

0

60

120

Mea

n fir

ing

rate

20 40 60Signal intensity (dB)

20 40 60Signal intensity (dB)

20 40 60Signal intensity (dB)

Mask intensity:

80

80

No mask80 dB

No mask

No mask

No mask

80 dB

0.130.5

Mask contrast:

0.130.5

a

c

d

b

Cell(Rose et al., 1971)

Model

Model

0

60

120

Relative frequency Relative frequency

Mea

n fir

ing

rate

0 0.25 1.21

0.250.06

150

100

50

00 3 6

Diameter (deg.)

Mea

n fir

ing

Rat

e

0 3 6Diameter (deg.)

Contrast:

Decibels90

40

Cell(Cavanaugh et al., 2000)

0 0.25 1.21

a

b

©20

01 N

atur

e Pu

blis

hing

Gro

up h

ttp://

neur

osci

.nat

ure.

com

© 2001 Nature Publishing Group http://neurosci.nature.com

Schwartz & Simoncelli, 2001