theoretical predictions of spatiotemporal receptive fields...

25
This is the webversion of the following article: van Hateren, J.H. (1992) Theoretical predictions of spatiotemporal receptive fields of fly LMCs, and experimental validation. J.Comp.Physiol. A 171:157-170. It was converted automatically from an original Latex-file, and slightly deviates from the published article. Theoretical Predictions of Spatiotemporal Receptive Fields of Fly LMCs, and Experimental Validation J. H. van Hateren Department of Biophysics University of Groningen Summary 1. A theory is presented that utilizes the structure of natural images, and how they change in time, to produce spatiotemporal filters that maximize information flow through a noisy channel of limited dynamic range. For low signal-to-noise ratios (SNRs) the filter has low-pass, and for high SNRs band-pass characteristics, both in space and time. 2. Theoretical impulse responses are compared to measurements in Large Monopolar Cells (LMCs) in the fly brain. Two different spatial stimuli (point source and wide field) were given at background intensities over a 5.5 log unit wide range. Theory and experiment correspond well, and they share the following properties: impulse responses get much faster and more biphasic with increasing background intensity (SNR); they show larger off-transients for wide field stimuli than for point sources; the half-width of the spatial receptive field changes only slightly with increased intensity, and lateral inhibition increases; contrast efficiency increases with intensity. Introduction Visual systems of living organisms are well adapted to the visual environments they usually encounter. Although the mean visual environment varies from animal to animal, the spatial structure of visual environments is remarkably constant. Images are by no means random. Recent work on the spatial characteristics of natural images (Burton and Moorehead 1987; Field 1987; Huang and Turcotte 1990) shows for instance that the spatial power spectra of these images are inversely related to the square of the spatial frequency. Furthermore, temporal aspects of the visual environment are for the greater part caused by the animal itself through its own movements, and these will therefore be stereotyped for a particular animal. Thus, much of the information present in the spatial and temporal structure of images is predictable, and therefore redundant. These considerations have led to the hypothesis that a major task of early vision - the first stages of visual processing - is to reduce the redundancy of the visual stimulus (Barlow 1961; Srinivasan et al. 1982). This enables the visual system to transmit as much information as possible through channels (neurons) of limited information capacity (Barlow 1961, 1981, 1983; Laughlin 1981, 1983, 1987). Properties that the visual systems of many different species have in common, like local light adaptation and lateral inhibition, can then be considered as means to

Upload: truongtuyen

Post on 26-Aug-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

This is the webversion of the following article:van Hateren, J.H. (1992) Theoretical predictions of spatiotemporal receptive fields of fly LMCs,and experimental validation. J.Comp.Physiol. A 171:157-170.It was converted automatically from an original Latex-file, and slightly deviates from thepublished article.

Theoretical Predictions of Spatiotemporal Receptive Fieldsof Fly LMCs, and Experimental Validation

J. H. van HaterenDepartment of BiophysicsUniversity of Groningen

Summary1. A theory is presented that utilizes the structure of natural images, and how they change in time,to produce spatiotemporal filters that maximize information flow through a noisy channel oflimited dynamic range. For low signal-to-noise ratios (SNRs) the filter has low-pass, and for highSNRs band-pass characteristics, both in space and time.

2. Theoretical impulse responses are compared to measurements in Large Monopolar Cells(LMCs) in the fly brain. Two different spatial stimuli (point source and wide field) were given atbackground intensities over a 5.5 log unit wide range. Theory and experiment correspond well,and they share the following properties: impulse responses get much faster and more biphasicwith increasing background intensity (SNR); they show larger off-transients for wide fieldstimuli than for point sources; the half-width of the spatial receptive field changes only slightlywith increased intensity, and lateral inhibition increases; contrast efficiency increases withintensity.

IntroductionVisual systems of living organisms are well adapted to the visual environments they usuallyencounter. Although the mean visual environment varies from animal to animal, the spatialstructure of visual environments is remarkably constant. Images are by no means random. Recentwork on the spatial characteristics of natural images (Burton and Moorehead 1987; Field 1987;Huang and Turcotte 1990) shows for instance that the spatial power spectra of these images areinversely related to the square of the spatial frequency. Furthermore, temporal aspects of thevisual environment are for the greater part caused by the animal itself through its ownmovements, and these will therefore be stereotyped for a particular animal. Thus, much of theinformation present in the spatial and temporal structure of images is predictable, and thereforeredundant.

These considerations have led to the hypothesis that a major task of early vision - the first stagesof visual processing - is to reduce the redundancy of the visual stimulus (Barlow 1961;Srinivasan et al. 1982). This enables the visual system to transmit as much information aspossible through channels (neurons) of limited information capacity (Barlow 1961, 1981, 1983;Laughlin 1981, 1983, 1987). Properties that the visual systems of many different species have incommon, like local light adaptation and lateral inhibition, can then be considered as means to

reduce redundancy (Srinivasan et al. 1982; Norwich and Valter McConvill e 1991).

In pioneering work, Srinivasan et al. (1982) applied these ideas to the Large Monopolar Cells(LMCs) in the fly lamina. These neurons (see e.g. Laughlin and Hardie 1978; van Hateren andLaughlin 1990) are directly postsynaptic to the photoreceptors, and, when light-adapted, displaystrong temporal antagonism and weaker spatial antagonism. Srinivasan et al. (1982) applied atechnique called predictive coding' to construct receptive fields of LMCs, and showed that thisled to both temporal and spatial antagonism, although spatial antagonism was stronger thanactually observed.

In this article I introduce an alternative theory that capitalizes on the spatiotemporal structure ofnatural images. It enables the prediction of neural filters that maximize information flow throughnoisy channels of limited dynamic range. Although the theory produces filters that reduceredundancy for high signal-to-noise ratios, the filters for low signal-to-noise ratios actuallyincrease redundancy (and thus reliabilit y). Therefore, this theory is not completely compatiblewith the hypothesis of `reducing redundancy', and it also leads to results different from those of`predictive coding'. The theory is generally applicable, and produces, virtually from firstprinciples, quantitative predictions of spatiotemporal receptive fields. In this article I will firstexplain the general theory, and then apply it to the specific case of f ly LMCs. Theoreticalpredictions for LMCs compare favourably to measurements over a wide range of intensities.

MethodsAnimals, preparation, and electrophysiology. Female flies ( Calliphora vicina), usually from theF1 of f lies caught outside, and aged between 1 and 3 weeks, were immobili zed with wax andmounted on a goniometer. Antidromic ill umination given through a small hole at the back of thehead enabled observing the far field radiation pattern and checking the optical integrity of theeye. A microelectrode was inserted in the retina and lamina through a small hole in the cornea,carefully cut with a chip of razor blade and sealed with sili con grease to prevent dessication. Thereference electrode, a 50 µm diameter silver wire, was inserted in a cut in the (non-ill uminated)most dorsal part of the same eye. Microelectrodes were fill ed with a mixture of 3 M KAc and 5mM KCl, and had resistances either between 150 and 200 MΩ (first series of experiments, 1.5mm borosili cate pipettes, electrodes drawn on a Sutter P77) or between 250 and 300 MΩ (secondseries of experiments, 1 mm borosili cate pipettes, electrodes drawn on a Sutter P80). The latterelectrodes yielded a higher percentage of very stable intracellular recordings (recording times ofmore than 2 hours). Ambient temperature during experiments was 19-20° C.

Stimuli and data acquisition. Data acquisition was performed as described previously (e.g. vanHateren 1986). Stimuli consisted of f lashed stimuli superimposed upon a background via a half-mirror. The background had an angular extent of 30°×30° and consisted of a grey paperill uminated by either a 15 W light bulb (three lowest intensities in Figs. 6 and 7) or three 1000 Whalogen lamps (three highest intensities in Figs. 6 and 7). The latter were all on different phasesof the mains in order to abolish the 100 Hz intensity ripple of each halogen lamp (due to the 50Hz mains frequency; at higher intensities, 100 Hz is well followed by fly photoreceptors).Background light intensity was subsequently regulated by neutral density filters. The intensity ofthe 15 W light bulb was adjusted to be about 1 log unit less effective for fly photoreceptors thanthe lowest intensity used with the three halogen lamps, which was checked by comparingphotoreceptor depolarizations for the two stimuli . I estimate that the uncertainty in thiscalibration is approximately 0.3 log units. The resulting intensities, relative to the lowest used,were 0, 1.1, 2.2, 3.3, 4.4, and 5.5 log units (log radix 10). These background intensities yieldedphotoreceptor depolarisations (plateau after at least 15 min light adaptation) of 2.2±0.4, 4.2±1.4,

10.4±2.4, 17.0±2.2, 25.2±2.2, and 29.7±2.6 mV (mean and s.d., i.e., standard deviation, of 6cells). Note that recording with the reference electrode in the contralateral eye or elsewhere in thebody would yield slightly smaller values, as in this case field potentials in the ill uminated eyegive rise to larger recorded extracellular potentials (DC up to 5 mV).

From bump counts at low intensities I found that a depolarization of 10 mV (intensity 2.2, widefield stimulus) corresponds to 1500±500 effective photon absorptions per second (mean and s.d.of 8 photoreceptors). In these experiments only the ommatidium containing the impaled cell wasill uminated by means of a diaphragm, in order to avoid generating and counting many (smaller)bumps transmitted via gap junctions in the lamina (see van Hateren 1986), thus avoidingoverestimating the bump count. This calibration is consistent with a direct measurement of theradiance at the cornea needed to elicit a 10 mV depolarization (based on measurements in 12photoreceptors with a wide field stimulus of 11° diameter, and a wavelength of 494 nm, i.e.,close to the spectral peak at 490 nm of photoreceptors R1-6). I measured 3.0 × 10-3 W/sr·m2

(determined with an EG&G model 550 radiometer). Assuming a circular lens with a diameter of24 µm, and a gaussian angular sensitivity with a half-width (i.e., full width at half maximum) ∆ρ= 1.4°, this yields an estimate of 2300±500 photons (of λ = 494 nm) available per photoreceptorper second. Only approximately half of the available photons are effective in eliciting a response(see Dubs et al. 1981).

The highest background intensity used in the experiments (5.5 log units) then corresponds to 5×106 available photons per photoreceptor per second (of which 3× 106 effective), thus slightly lessthan the 7 × 106 photons I estimate to be available per photoreceptor per second from a 50%reflecting Lambert surface ill uminated by full sunlight (assuming a mean solar irradiance of 1100W/m2 over a 105 nm wide spectral window of the visual pigment). The other intensities roughlycorrespond to daylight, overcast sky (4.4), interior light (3.3), dim interior light (2.2), street light(1.1), and night (0).

The flashed stimulus superimposed on the background was either a point source or a wide field.As a point source stimulus I used a LED (Siemens LD57C, light flashes of 2 ms duration andrepeated every 500 ms) for the three lowest background intensities, and an electronic flash (Metz,mecablitz 60 CT-4, flashes shorter than 1 ms, time between flashes 2.5 or 5 s) via a lightguide forthe three highest background intensities. In both cases the angular extent of the stimulus was lessthan 0.3°. As a wide field stimulus I used a diffuse screen (angular extent 30°×30°) ill uminated bythe electronic flash.

For the LED stimulus, response amplitudes were between 1 and 2 mV, all other responses wereabout 5 mV. This is still reasonably well within the linear range of LMCs, although I confirmedthe finding of Laughlin et al. (1987) that the on- and off- transients have slightly differentdependencies on flash intensity. This nonlinear effect is neglected in the theory and experimentsin this article. Another nonlinear effect in LMCs is neglected as well , namely the responsecompression at higher contrasts, which is matched to the natural distribution of contrastamplitudes (Laughlin 1981). In fact, this nonlinearity has few consequences for the developedtheory, apart from effectively increasing the ratio of dynamic range to channel noise (see Theoryand Results).

For the theory an estimate of the photoreceptor signal-to-noise ratio (SNR) is needed for stimuliwith natural statistics and contrast. In a separate series of experiments on photoreceptors, Iestimated these for the various background intensities, by presenting the photoreceptors withsuch a stimulus (of contrast 0.32), moved at 110 degree/s. From the resulting data, I calculatedthe signal and noise power spectra, and from these I determined the RNS (average signal-to-

noise ratio) as defined in the Appendix. Finally, I corrected this for the used stimulus velocity,for the required standard contrast (0.40), and for the light dependence of the photoreceptorintegration time (∆t). The RNS needed for the theory assumes ∆t = 7 ms, with ∆t the half-widthof the photoreceptor impulse response (with wide field stimulation). Actual measured values(mean and s.d. of 9 photoreceptors) at the 6 intensities were ∆t = 23.8±2.4, 17.5±1.5, 10.7±0.8,8.2±0.7, 6.8±0.4, and7.0±0.6 ms. Resulting RNS s were 0.5±0.1, 1.5±0.2, 4.5±0.3, 11±1, 22±3,and 21±4 (mean and s.d. of 7 cells).

In principle these RNS s may be underestimates of the RNS s as encountered by the LMCs,because 6 photoreceptors with identical fields of view converge on each LMC (Kirschfeld 1967;see also van Hateren 1987). I have reason to believe, however, that this is only a small effect. The

RNS measured in a photoreceptor is already larger than expected from an isolated photoreceptor,because photoreceptors are electrically coupled (van Hateren 1986). Furthermore, measurementsof RNS s in LMCs yield values close to or smaller than those in photoreceptors ( RNS s Imeasured were 0.6±0.1, 1.7±0.3, 4.6±0.9, 8.4±1.5, 14±3, and14±3; mean and s.d. of 18 cells).The reason is that the RNS is an average measure of signal-to-noise ratio, and that LMCs haveincreased SNRs at some temporal frequencies, but decreased SNRs at other frequencies ascompared to photoreceptors (the √6 increase reported by Laughlin et al. 1987 appears to be theconsequence of the temporal frequency content of the specific stimulus they use). Similararguments will apply to the superimposed photoreceptor signals at the axon terminals. In view ofthese considerations I made no attempt to correct the RNS s for convergence, accepting thelimited validity of these values. I prefer to use them, however, rather than using the RNS as afree parameter in producing fits as in Figs. 6 and 7.

Image processing and theoretical computations. Images studied for Fig. 1a were obtained bygrabbing images through a Phili ps 56470 CCD-camera module and a Data Translation DT2861framegrabber running in a PC/AT-computer. Most images were obtained from publishedphotographs of widely varying scenes: trees, gardens, animals, mountain scenery, people, urbanscenery, etc. Both the camera and the printing process of the photographs obviously were sourcesof low-pass filtering of the original scenes. I made no attempt to correct for this, but insteadlimited the analysis to the lower-frequency part of the spatial frequency spectrum, assuming thatthe lower spatial frequencies were littl e affected. Also, no calibration of the intensity wasundertaken (both the printing process and the camera may map scene intensities to imageintensities in a nonlinear manner). This is hardly a limitation to the analysis, however, as I foundthat simple, monotonous nonlinear transformations (li ke squaring, square root, logarithm) onnatural images had negligible influence on the slope of the spatial frequency spectrum (the slopeis used in the analysis below). The reason is that the shape of the spectrum is mainly determinedby the position and orientation of the edges in the image, and hardly by their exact amplitude.Before Fourier transforming, images were tapered with a Kaiser-Bessel window (α = 3, seeHarris 1978).

Theory (summarized in the Appendix) was implemented on a Hewlett-Packard workstation asFortran programs. Numerical routines were adapted from Press et al. (1989). Power spectra andfilters were evaluated in three dimensions (two spatial and one temporal), with a resolution of128 points per dimension, thus yielding calculations on matrices of size 128×128×128. Mostfigures shown in this article were drawn using a PC/AT-computer running the program Axum(TriMetrix, Seattle).

Theory and ResultsIn this section I will present the main line of theoretical reasoning, present results of theoreticalcalculations, and compare these with measurements. The starting point of the theoreticaldevelopment is the structure of the natural visual environment. Although this can be studied invirtually unlimited detail, I will make the simplest possible assumption that for the first steps invisual processing only the most general structure is taken into account.

Spatial structure of natural images

A useful characteristic of an image is its spatial power spectrum, a measure of the relativeabundance of particular spatial frequencies in the image. Recently, Field (1987) and Huang andTurcotte (1990) published power spectra of several natural images, showing that they are verysimilar even if the images are very different. Differences between pictures are mainly expressedin differences in the phase part of their spectra (see e.g. Oppenheim and Lim 1981). I investigatedthis further by examining the spatial power spectra of 117 natural images, mostly obtained frompublished photographs of widely varying scenes (see Methods). In general, the two-dimensional

spectra were circularly symmetrical inreasonable approximation, and they wereaveraged to obtain radial spectra. Whenplotted on a log-log scale, virtually all radialspectra could be well fitted by a straight linein the interval between 1 and 80 cycles perimage (see below). The average of theslopes of the fitted lines was -2.13±0.36(mean and s.d.). The average of the powerspectra of the 117 images is shown inFig. 1a as thick dots, where the thin dotsshow the standard deviation for each spatialfrequency. The continuous line has a slopeof -2.13, and was obtained by fitting astraight line to the data points in the intervalbetween 1 and 80 cycles per image. Higherspatial frequencies deviate from this line,most likely because low-pass filtering dueto printing and camera was notcompensated for (see Methods and Field1987). A slope of approximately -2 for thepower spectrum was also reported by Field(1987) and Huang and Turcotte (1990), andI will therefore assume this value in thetheoretical calculations. If such a powerspectrum is Fourier-transformed, one findsan autocorrelation function which can beroughly approximated by an exponential (asused by e.g. Srinivasan et al. 1982).

As the images taken from photographscontain unknown amounts of contrastscaling, I could not use them as a database

Fig. 1 a. Average power spectrum of 117 images.Thick dots denote the mean, thin dots the standarddeviation. The straight line was fitted to pointsbelonging to spatial frequencies up to 80 cycles perimage, and has a slope of -2.13. Power density inarbitrary units. b. Velocity model of the fly visualsystem (see text for explanation).

for determining average contrast. Instead, I adopted the value determined by Laughlin (1983),who found an average contrast of 40% for natural scenes.

Temporal structure: the velocity model

If observed from a fixed point of view, natural images generally contain few movingcomponents: most objects remain in a fixed position. Nevertheless, an animal's visual system willin practice encounter almost continuously movements, because the animal will generally bemoving around, or at least be moving its eyes or head. Again, I will make the simplest possibleassumption, namely that the early stages of vision are well adapted to this most common ofvisual movements, namely that due to the animal's own movements. This does not imply that Iunderestimate the importance of movements generated by agents outside the animal (li kepredators or prey), only that I assume that this is taken care of by neural circuits further upstreamin the nervous system, or by highly specialized circuits running parallel to the main informationchannels (see e.g. Jansonius and van Hateren 1991).

What temporal structure is imposed on the visual world by the animal's own movements? Toanswer this question experimentally, one would ideally have to mount a camera on the animal'seyes, recording whatever scene the animal is viewing while it is moving around naturally.Instead, we will here just try to make a reasonable estimate of temporal structure. In theAppendix I show that if a visual system is moving in a straight line amidst uniformly distributedobjects, the velocities v of components in the resulting image will have a probabilit y distributionav(v) that behaves as 1/v2 for large v. The 1/v2-behaviour is retained if the animal has a wholerange of possible speeds. For reasons of simplicity we will assume that angular movements of thevisual system are approximately distributed in a similar fashion, and that visual processingduring fast saccade-like movements (see e.g. Wagner 1986a) is negligible. As a very simpledistribution av that displays this behaviour for large v, and also behaves well for small v, I choseav proportional to 1/(v+σv)

2, with σv a constant determining the width of the distribution (see alsothe Appendix). This function is shown in Fig. 1b, with a value of σv that is discussed in thesection below on the characteristic velocity. I will call the velocity distribution of imagecomponents, av, the velocity model of a particular animal's visual system. Note that this is not thevelocity distribution of the animal itself, but of the components of the image as perceived by theanimal. Thus it is the combined result of the movements of the animal's visual system and thespatial structure of its surroundings.

Spatiotemporal structure

The spatial structure of natural images and the velocity model together yield the probable powerin a certain spatial frequency fs moving with a certain velocity v. Although this space-velocitypower spectrum might be used as a basis for further theoretical development, it is moreconvenient to transform this spectrum to one depending on spatial frequency and temporalfrequency (ft, using ft=v fs). The reason is that, although space and velocity might be morefundamental than space and time from the point of view of visual processing, space and time aremore fundamental from a physical point of view. In particular, the most important constraint onwhat kind of spatiotemporal filters are physically realisable is that of causality (cause precedeseffect), and this is more easily handled in the time domain than in the velocity domain.

Fig. 2 shows the resulting spatiotemporal power spectrum encountered by the animal when it ismoving around in a natural visual environment (see the Appendix for the appropriate equations).It shows only one of the two spatial frequency axes. The figures in this article always show onlyone spatial axis, but the calculations are actually performed in three dimensions: two spatial (x

and y) and one temporal.

We see in Fig. 2 that the powerdensity is highest for (fs,ft)=(0,0),i.e., the average backgroundintensity, neither changing in timenor space. Progressively higherspatial and temporal frequenciescontain less and less power.Obviously, noise in the visualsystem will then affect high spatialand temporal frequencies first, andwe will see below that a visualsystem will reduce temporal andspatial acuity when noise isincreased (e.g., when ambient lightintensity is reduced).

Towards a theory of maximizing information flow

In the section above we have studied the spatiotemporal image stream (i.e., the image as itchanges over time) as it presents itself to the eye. Below I will discuss how early vision may deal

with this input. The basic steps of thetheory are depicted in Fig. 3. We assumethat the image stream is first low-passfiltered in space and time, in order toreduce the amount of subsequentprocessing necessary. The resulting imageis not perfect, but degraded by noise.Finally, a neural filter tries to mould thesignal in such a way that as much aspossible of the information it contains istransmitted through a channel (neuron)with a limited dynamic range and somenoise of its own. The visual system thenconsists of an array of these channels, eachdevoted to a particular direction in space.The issue of how many of these channelsare needed, and how tightly they should bepacked (see e.g. Howard and Snyder 1983)is beyond the scope of this article, and willbe ignored. Rather, the theory addressesthe optimization of the performance ofsingle channels. Below I will discuss thevarious steps of the theory in more detail.

Spatial and temporal prefiltering

As an animal can not afford to allocateunlimited resources to its visual system,the visual system has to limit itself to

Fig. 3 Scheme of early vision. An image is low-passfiltered in space and time with a prefilter, and to theresult noise is added (photon noise and transducernoise). A neural filter subsequently filters theprefiltered image in such a way, that as much aspossible of the information it contains is transmittedthrough a channel of limited dynamic range thatproduces some noise of its own.

Fig. 2 Spatiotemporal power spectrum of natural images seenby a visual system with a velocity model as in Fig. 1b.

processing only a certain range of spatial and temporal frequencies. In practice, this means low-pass filtering before or during the sampling of the image stream, both spatially and temporally.How much of the spatiotemporal spectrum will be sampled by a visual system depends on manyfactors, such as the li festyle of the animal (e.g., moving around only during the day, or at dusk, inopen terrain or beneath a roof of foliage), the amount of metabolic energy it can spend on itseyes, structural factors as the optical design of the eye, or the mechanical characteristics of itslocomotive apparatus, etc. Attempts have been undertaken to develop a general theory of eyedesign encompassing several of these factors (Snyder et al. 1977, Howard and Snyder 1983).Nevertheless, this theory seems to contain, at its present stage, too many uncertainties to serve asa basis for a solid theory of spatiotemporal filtering. Therefore, I will not try to derive the limitsof spatial and temporal resolution from a more general, ecologically oriented theory, but ratherconsider them as mere given properties of the visual system under study. Once we have acceptedthese limits as given, the answer to the question of how to proceed and construct aspatiotemporal filter optimizing information flow, follows virtually from first principles.

The spatial prefiltering in the fly visual system is entirely due to the optical apparatus, and is wellunderstood (Smakman et al. 1984; van Hateren 1984, 1989). For simplicity, we will approximatethe transfer function of the spatial filter with a gaussian (see Appendix, and Götz 1965).Assuming a half-width ∆ρ of the angular sensitivity of 1.4°, we get a standard deviation σs of thespatial transfer function of 0.268 cycles/degree. In flies, the spatial filter due to thephotoreceptors is to a reasonable approximation independent of the background light intensity(Smakman et al. 1984). In the present theory any remaining dependence is considered to be partof the subsequent neural filter (see below).

Temporal prefiltering by fly photoreceptors is mainly due to the process of phototransduction,yielding an impulse response of f inite temporal width. I measured impulse responses inphotoreceptors over a 5.5 log units wide range of background intensities, and found that the highfrequency asymptote of the resulting temporal transfer functions is well described by anexponential function. This is consistent with the results of Howard et al. (1984) that the impulseresponse of insect photoreceptors is well described by a log-normal function, of which the highfrequency asymptote is also close to an exponential over a substantial range. At higherbackground intensities the photoreceptor response speeds up appreciably and becomes biphasic,resembling already somewhat the LMC impulse response. Thus the temporal filtering by thephotoreceptors is very much dependent on the background light intensity. For the theoreticalmodel we will split this up in two components, however (see the Discussion for theconsequences of this choice). The first component is a fast temporal filter independent ofbackground intensity, whereas the second is adapting to light intensity. The former is thenconsidered as the filter performing the temporal prefiltering, while the latter is considered as partof the neural, spatiotemporal filter to be discussed below. For the temporal prefiltering we thustake an exponential transfer function exp(-ft/σt), with ft the temporal frequency, and σt = 45 Hz,corresponding to the high frequency asymptote I measured at the highest available backgroundintensity.

The characteristic velocity, vc

The widths of the spatial and temporal prefilters discussed above are associated with a velocitythat I will call the characteristic velocity vc, defined here as vc = σt/σs. With σt = 45 Hz and σs =0.268 cycles/degree, we find vc = 168 degree/s. The characteristic velocity is a measure of theimage velocity required to move across a typical receptive field in a typical integration time ofthe photoreceptor (see Glantz 1991). In the theory, the parameter σv that determines the velocitymodel (see above), is most conveniently expressed in terms of vc, and the example of Fig. 1b is

actually for the case σv = 0.1 vc.

Noise in the photoreceptor image

The image as present in the two-dimensional array of photoreceptors will not be free from noise.Two main sources of noise have been identified (Lillywhite and Laughlin 1979): photon noiseand transducer noise. The former is a consequence of the quantum nature of light, and willdepend on the ambient light intensity. The latter is due to the phototransduction process,amplifying a single photon absorption to a macroscopic voltage fluctuation in the photoreceptorcell. The resulting signal-to-noise ratio was found to depend monotonously on the average lightintensity (Laughlin et al. 1987), up to a certain intensity beyond which it saturates (see also the

RNS estimates given in Methods). I found that the noise power density of the photoreceptors isflat to a good approximation for all background intensities I measured (see also de Ruyter 1986),at least in that part of the frequency spectrum where the signal power density is not negligible.Thus, for the sake of simplicity, we will assume below a constant noise power spectrum,although this is not strictly required for the theory. As depicted in Fig. 3, this noise is supposed tobe additive to the prefiltered image.

A neural filter optimizing information flow

How should the image at the level of the photoreceptors be transformed, by a neural filter, inorder to maximize information flow? We will assume that after the transformation, the image isflowing into an array of channels (neurons) having a limited dynamic range and generating somenoise of their own (see Fig. 3). This will for each spatiotemporal frequency result in a certainsignal-to-noise ratio (SNR) in the channel. We can then find the amount of information carriedby this frequency from log2(1+(SNR)2) (see Appendix). By integrating this over all spatial andtemporal frequencies we obtain the total amount of information flowing through the channel persecond and per steradian. Finally, we choose the neural filter such that this total amount ofinformation is maximized.

The assumption that the channel is noisy and has limited dynamic range is not only realistic forneurons, but also necessary if one wants to obtain a unique spatiotemporal filter. Suppose, forinstance, that there was no internal noise generated in the channel. Then any spatiotemporaltransformation would be as good as any other, because no information would be lost: at everyspatial and temporal frequency the SNR already present in the photoreceptors would bemaintained. Suppose, on the other hand, that there was internal noise, but no limit on theresponse amplitude (unlimited dynamic range). Then the effect of internal channel noise couldeasily be made negligible by boosting the amplification of the neural filter, such that the resultingsignals would be much larger than the internal channel noise. Then, again, any spatiotemporalfilter would do as well as any other.

Therefore, I will assume in the calculations a fixed dynamic range of 100 mV2 (variance ofvoltage fluctuations due to signal and noise while viewing natural image streams; i.e., thestandard deviation is 10 mV) and an internal noise with a flat power spectrum and a magnitudeof 0.5 mV2 (variance of fluctuations due to internal noise; s.d.=0.71 mV). These values seem tobe of the right order of magnitude for fly LMCs, but their exact values are not too critical for thecalculations below. Given an internal channel noise and a dynamic range of the channel, it ispossible to compute the spatiotemporal filter that preserves as much as possible of theinformation present in the array of photoreceptors (see Appendix).

An example in the time domain

Before presenting results on the full spatiotemporal case, I will first discuss an examplecalculated in the time domain. By limiti ng ourselves to a single dimension, basic properties of thetheory become more readily apparent, and we may gain some insight into what a neural filter cando for improving information transfer. Fig. 4 shows such a series of theoretical results. In Fig. 4a,st(ft) shows the square root of the power spectrum of temporal variations, on average to beexpected by the visual system. In fact, this function was extracted from the full spatiotemporalpower spectrum Sxyt(fx,fy,ft) (Fig. 2) by integrating over the two spatial dimensions and taking thesquare root.

The temporal variations of Fig. 4a are subsequently filtered by an exponential prefilter (Fig. 4b),which results in the solid line of Fig. 4c. The dashed line shows the photoreceptor noise for a

RNS = 100 (the RNS is anaverage measure of the signal-to-noise ratio, and is definedin the Appendix. It is used forconvenience only, and itsexact value and definition arenot important for the results).Although, for succinctness'sake, we use a definition for

RNS that gives a singlenumber for the signal-to-noiseratio, we see in Fig. 4c that theSNR (note the different usewe are making of SNR, afunction of frequency, and

RNS , a single number) is nota constant, but depends on ft.For low ft the SNR is large,and for high ft it is lowbecause the image containslittl e power in high temporalfrequencies.

Fig. 4d shows the neural filterthat follows from the theorysummarized in the Appendix.This neural filter reduces theamplitude of low temporalfrequencies, with the resultthat these components occupyless of the limited dynamicrange of the transmissionchannel. Little information islost by reducing low temporalfrequencies, because they havea high SNR (see Fig. 4c).Because SNR and amount of

Fig. 4 Example of the theory in the time domain for RNS = 100.a. Average temporal structure of images. b. Temporal prefilter. c.Continuous line: prefiltered image; dashed line: additi ve noise. d.Neural filter optimizing information flow. e. Continuous line:image after pre- and neural filtering; dashed line: noise filtered byneural filter; dotted line: additi ve noise due to the channel itself. f.Signal-to-noise ratio resulting from e. g. Information followingfrom f. h. Combination of pre- and neural filter. All amplitudesare in arbitrary units.

information are logarithmically related, reducing a SNR of 1000 by a factor of two sacrificesapproximately the same amount of information as reducing a SNR of 10 by a factor of two,whereas the former reduction frees much more of the channel's dynamic space. High temporalfrequencies are also reduced, as in that part of the spectrum the SNR is small , and thus thesefrequencies contribute littl e to the information transferred. Moreover, if not reduced inamplitude, their noise component, ne, would occupy a significant part of the channel's dynamicrange. It can be shown analytically that the neural filter is maximal for a temporal frequencywhere signal power equals noise power (SNR=1, see Fig. 4c). This is not only true for thetemporal case with RNS = 100 (Fig 4), but also for the general spatiotemporal case and any

RNS .

The result of the neural filtering is shown in Fig. 4e (image: solid line; noise: dashed line). Thedotted line shows the noise due to the channel itself. Fig. 4f shows the resulting SNR, and fromthat the information follows directly (Fig. 4g). It is the integral over the curve in Fig. 4g, i.e., thetotal information transmitted per second, that has been maximized by the neural filter of Fig. 4d,while keeping the total power of signal and noise (in fact the sum of the integrated powers of thesignal and noise as shown in Fig. 4e) within the dynamic range of the channel. Finally, Fig. 4hgives the combination of pre- and neural filters (Figs. 4b and d), thus the overall filteringperformed by the early visual system.

Comparing Figs. 4c and f, we see that the result of the neural filter has been to reduce the SNR atlow and high frequencies in order to preserve as much as possible the SNR in that part of thespectrum where SNR ≈ 1. The result of this is that the SNR (Fig. 4f) and the information(Fig. 4g) are more evenly distributed over the frequency spectrum than expected on the basis ofFig. 4c. This becomes even more true for still l arger RNS , i.e., for ne smaller than shown inFig. 4c. Then the maximum of the neural filter (Fig. 4d) shifts further to the right, and the SNRand information become constants over an appreciable (low-frequency) part of the spectrum.

The opposite effect occurs for low RNS . When ne (Fig. 4c) increases, the peak of the neural filterwill gradually shift towards ft = 0, with the result that for RNS s smaller than about 0.1 the neuralfilter is purely low-pass.

Spatiotemporal examples and the transition to space-time

The results for the full spatiotemporal case (two spatial axes, one temporal) are analogous to theone-dimensional example above. Fig. 5 shows examples for two different RNS s. Fig. 5a showsthe combined pre- and neural filter (as in Fig. 4h) for RNS = 0.5, and Fig. 5c for RNS = 22, bothin the spatiotemporal frequency domain. Again, the filter for the low RNS (Fig. 5a) is almostentirely low-pass, both for spatial and for temporal frequencies. Temporal frequency curves areonly band-pass for low spatial frequencies, and spatial frequency curves are low-pass except for ft

= 0. On the contrary, the filter for the higher RNS (Fig. 5c) is more strongly band-pass, both forspatial and for temporal frequencies. Note, however, that spatial frequency curves become low-pass for high temporal frequencies, and temporal frequency curves become low-pass for highspatial frequencies. Interestingly, similar spatiotemporal characteristics have been measured inthe human visual system (e.g. Kelly 1979), an issue that will be addressed in a forthcomingpaper.

It is sometimes more convenient to describe spatiotemporal filters in the space-time domainrather than in the spatial and temporal frequency domain. On the basis of our results so far, thistransition can not be made unambiguously, because information on the phase-behaviour of the

filters is lacking. If we restrict ourselves for the present purpose to filters that are circularlysymmetrical in space, the spatial part of the spatiotemporal filter will only contribute a phase ofzero for receptive fields centered at the origin of the spatial coordinate system. The temporal partwill not have a zero phase, however, as this would lead to non-causal impulse responses (i.e., theresponse starts already before the stimulus is given). Although there are other reasonablepossibilities, I will present here results where the phase has been determined on the assumptionthat for each spatial frequency the temporal filter is a minimum phase filter combined with a puretime delay. A minimum phase filter (see e.g. Humpherys 1970) has several interesting properties,one of them being that it gives the fastest possible impulse response physically realizable (i.e.,being causal). See the Appendix for more details on the calculation.

Figs. 5b and d show the minimum-phase impulse responses obtained from Figs. 5a and c. For thesake of clarity, both are shifted (delayed) by an arbitrary amount of 20 ms. Again, only onespatial dimension is shown in the figures. For RNS = 0.5, the response is monophasic for on-axisstimuli (point source at 0°), but biphasic for off-axis stimulation (hard to discern in Fig. 5b, butnot unappreciable for wide field stimuli, not in the least because there is a second spatialdimension; see also Fig. 7g). This lateral inhibition is much stronger for RNS = 22 (Fig. 5d), andnow also the on-axis response is biphasic. Comparing Figs. 5b and d, we see that the temporal

Fig. 5 a. Amplitude of the spatiotemporal transfer function, mstn, for RNS = 0.5; mstn is thecombination of prefilter and neural filter. For the sake of clarity, the line belonging to the 3rd spatialfrequency and the line belonging to the 6th temporal frequency are emphasized. b. Impulse responsehstn corresponding to the mstn of a. Temporal phase was determined on assumption of minimumphase, and the response peak was shifted to 20 ms by a pure time delay. The small bump at timezero is a numerical artifact due to the limited number of points (128) available along the time axis

when evaluating the minimum-phase filter. c. Same as a, now for RNS = 22. d. As b, for RNS =22. Amplitudes are normalized to 1.

half-width of the response becomes much smaller for larger RNS , whereas the spatial half-widthchanges only little. This asymmetry between spatial and temporal behaviour as a function of lightand dark adaptation is a consequency of our choice of σv (σv = 0.1 vc), and can be changed bychoosing σv differently. Spatiotemporal kernels similar to Fig. 5d were recently obtained byJames and Osorio, using a white noise technique (see Osorio 1992).

LMC measurements

I measured impulse responses in LMCs at various background intensities for two different spatialstimuli: a point source and a wide field. Results of the point source stimulus are shown inFig. 6a-f for a range of background intensities. The dots show the average (after normalization)of measurements in 6 to 8 cells, and the bars show the standard deviation. As measurements werenormalized before averaging and the peaks of different measurements do not coincide exactly,the peak height of the averages is generally slightly less than 1. Fig. 6g-l shows theoretical

Fig. 6 a-f. Measuredimpulse responses ofLMCs to short flashes ofa point source,superimposed on 6different backgroundintensities. Dots and barsshow mean and standarddeviation ofmeasurements in between6 and 8 cells, where ineach cell one or moremeasurements were takenwith at least 320 stimulusruns. g-l. Theoreticalimpulse responses for 6

corresponding RNS s(see Methods). Responsepeaks were shifted to 20

ms. The RNS s for thetwo highest intensitiesare not significantlydifferent ( 22±3,and21±4).

calculations for RNS s corresponding to the various intensities of Fig. 6a-f (see Methods). Thesign of the impulse responses is reversed in order to facilitate comparison with themeasurements. The only significant free parameter in these calculations was σv, the parameterdetermining the width of the velocity distribution perceived by the animal. I found that a value σv

= 0.1 vc (thus σv = 16.8 degree/s) gave satisfactory correspondence between measurement andtheory, thereby also taking into account the results with wide field simuli (see below).Correspondence is still good if we take half or twice this value for σv, but deteriorates rapidly forsmaller and larger σv.

Fig. 7 shows experimental and theoretical results for wide field stimuli, analogous to Fig. 6. Notethat, as compared to point source stimuli, the off-transients are larger, both in the measurementsand in the theoretical calculations. The impulse responses predicted by the theory also allowcalculation of the contrast efficiency as a function of RNS . The contrast efficiency is definedhere as the peak amplitude of the step response to a wide field stimulus of unit contrast. I found

Fig. 7 a-f. Measuredimpulse responses ofLMCs to short flashes ofa wide field stimulus,superimposed on 6different backgroundintensities. Mean andstandard deviations arefrom between 6 and 12cells, again with one ormore measuments fromeach cell with at least 320stimulus runs. g-l.Correspondingtheoretical impulseresponses, peaks shiftedto 20 ms.

that the contrast eff iciency increases by a factor of 3.3 over the range of RNS s shown in Fig. 7,which is consistent with measurements of this increase reported by Laughlin (1980) and byStraka and Ammermüller (1991).

DiscussionIn the approach layed out above, the starting point was the spatiotemporal structure of the visualworld. From there we progressed by making only few and fundamental assumptions on furtherprocessing, with as main assumption that early vision is tuned to maximize spatiotemporalinformation flow by making use of the spatial frequency content of natural scenes and thevelocity model of the visual system. We finally got predictions of spatiotemporal receptive fieldsvery similar to those that were measured in fly LMCs. Although the correspondence betweenexperiment and theory is not perfect in all it s details, the main features are very well predicted.

Both experimental and theoretical impulse responses

• get faster in a similar way when the background light intensity (or RNS ) is increased;

• get more biphasic with increased intensity ( RNS ), with similar time courses and similarratios between on- and off- transients;

• show larger off- transients for wide-field stimulation than for point-source stimulation,both in a quantitatively similar way;

• show just a small change in the half-width of the spatial receptive field (see Dubs 1982),although there is lateral inhibition for higher intensities ( RNS s); see Figs. 5b and d.

This remarkable correspondence was obtained by adjusting only a single parameter in the theory,namely σv, which determines the width of the velocity distribution. All other parameters weremeasured or estimated, such as σs (standard deviation of the spatial transfer function of theprefilter), σt (determining the width of the temporal transfer function of the prefilter), and RNS s(average signal-to-noise ratios at the prefilter). Varying σv by a factor of two changes the detailedshapes of the theoretical impulse responses, but leaves the overall correspondence of theory tomeasurements essentially intact. Only large changes in σv lead to a rather different picture, wherethe trade-off between spatial and temporal resolution as a function of RNS is balanceddifferently.

The σv used (σv = 0.1 vc) equals 16.8 degree/s in the fly. This speed corresponds to the angularspeed of an object at a distance of 3.4 m, seen laterally by a fly cruising at 1 m/s. Closer objectswill have higher angular speeds, objects seen more frontally lower angular speeds. It is diff icultto evaluate how realistic the velocity model av(v) with this σv actually is. This depends verymuch on the details of the flying (or walking) behaviour in relation to the momentary visualsurroundings. Even if f light behaviour has been studied in detail (Land and Collett 1974; Wagner1986a, b, c), the interpretation is further complicated by the fly's abilit y to produce fast headmovements (Hengstenberg et al. 1986), apparently partly to stabili ze gaze during walking orflight. Given these uncertainties, I have no reason to believe that the σv used is not of the rightorder of magnitude.

The role of LMCs

The possible role of LMCs has been investigated before, notably by Laughlin and Hardie (1978),

Laughlin (1980), Srinivasan et al. (1982) and Srinivasan et al. (1990). Laughlin and Hardie(1978) and Laughlin (1980) argue that a main function of spatial and temporal high-passmechanisms in LMCs (lateral and self inhibition) is to adjust the operating range of the LMCs.Because the large DC-levels of the photoreceptors are virtually cancelled by these filterproperties, LMCs are, according to Laughlin (1980), specialised to retrieve contrast informationfrom the photoreceptors. Interestingly, virtual cancellation of DC-levels, and contrastenhancement, are indeed produced by the present theory.

Srinivasan et al. (1982) have developed a quantitative theory of LMC responses. They assumethat the LMCs are performing a form of predictive coding, i.e., that they only signal thosecomponents in the stimulus that can not be predicted by the immediate spatial surround or theimmediate past. I implemented their theoretical scheme in a computer program, and confirmedthe results of the theoretical calculations they present in their paper. I then compared thepredictive coding scheme with the present theory (for a one-dimensional case). It turned out thatthe results are quite different. One particular difference is that the predictive coding schemealways produces strong lateral inhibition (of varying lateral extent), because the transfer functionof the filter is zero for f = 0 (with f the spatial of temporal frequency). This is not so in the theoryof this article: the transfer function is not exactly zero for f = 0, and for very low RNS s it islarger at f = 0 than at other frequencies. Related to this is that the predictive coding filter does notproduce broadening of the the temporal impulse response for low RNS s, as the theory presentedhere does (consistent with experiment). The main reason for this discrepancy is that predictivecoding is in fact a specific implementation of the principle of reducing redundancy (Barlow1961), whereas the theory of this article produces filters that only reduce redundancy at high

RNS s. At low RNS s redundancy is even increased, because the neural filter then low-pass filtersthe prefiltered image (with both image and prefilter already of a low-pass nature, i.e., with areduced high-frequency content and transfer, respectively). This low-pass filtering increases thespatial and temporal correlation in the image, and thus increases redundancy (see Srinivasan etal. 1982). The advantage is, however, that it increases the reliabilit y (SNR) of the signal, andthereby the amount of information transferred through the channel, because low frequencycomponents can be ampli fied more (within the contraints of available dynamic range), and arethus better protected against further contamination by (channel) noise. Thus the principle of`reducing redundancy' is not completely compatible with the principle of `maximizinginformation' as utili zed in this article.

In a more recent article, Srinivasan et al. (1990) abandon the idea that LMCs perform a general,information preserving task in favour of the hypothesis that LMCs are already specialized indetecting specific features. These features appear to be moving blobs (while the system is in thedark-adapted state) or moving edges (while light-adapted). Srinivasan et al. show that thespatiotemporal receptive fields of LMCs are well suited for these purposes. This interestinginterpretation is not inconsistent with the theory presented in the present article. As the predictedspatiotemporal receptive fields are similar to the ones used by Srinivasan et al. (1990), theobservation that LMCs are good in detecting moving blobs when dark-adapted, and movingedges when light-adapted is a direct consequence of the theory. Whether one should considerfeature detection as the main purpose of LMCs, or as a byproduct of more general processing (theLMC as a detector' of natural image streams), seems to be a matter of taste as long as littl e isknown about the function of the neural circuits towards which LMCs project.

In this respect it is interesting to note the existence of various types of LMCs giving similarresponses. Each of these neurons may have evolved to highlight slightly different aspects of thevisual surroundings. Strausfeld and Lee (1991), for instance, suggests that one of the LMCs, L3,

is specialized for colour processing (presumably in collaboration with R7 and R8), while L1 andL2 are involved in motion processing. Slight differences in response characteristics, in particularof the plateau and of the off- transient, have been reported between L3 and L1/L2 (Hardie andWeckström 1990) and between L1 and L2 (Laughlin and Osorio 1989). In fact, it is possible toextend the theory in this article to include considerations as these, and produce filters which arefor instance especially suited for colour processing. In this article, however, I wanted to presentthe simplest possible theory, leading to something that one could call a generic LMC. Also forthe measurements I made no attempts to classify the LMCs recorded from, and thus the responsescan also be considered to be those of a generic LMC.

The non-correspondence of theory and fly visual system

The spatial and temporal prefilters of the theory together constitute what one could call thetheoretical photoreceptor. Note that this is a much simpler device than the real photoreceptor ofthe fly. Firstly, the theoretical photoreceptor is a purely linear device, whereas the real one ishighly adaptive to average light level. Light level and response are more or less logarithmicallyrelated, although the response around a specific working point is reasonably linear (see Laughlinand Hardie 1978). Secondly, the theoretical photoreceptor has fixed temporal properties, unlikethe real photoreceptor. Both of these adaptive properties are, however, taken care of by the neuralfilter, and we have seen above that the combination of pre- and neural filter is very similar intheory and in the fly. Thus whereas in reality adaptation is produced jointly by photoreceptorsand LMCs, in the theory it is entirely due to the neural filter. The advantage of this choice of asimple prefilter is that it keeps the theory as simple and general as possible. At the same time,however, it implies that one has to be careful not to directly identify the prefiltered image of thetheory with the photoreceptor image in the nervous system.

I would like to emphasize at this point, that the theory I presented in this article should not bemistaken for an attempt to model part of the fly visual system, neither mechanistically norphenomenologically. It is not intended as a mechanistic model in a strict sense, because it doesnot aim at explaining early vision in terms of a series of theoretical components that physicallymimic similar, corresponding components in the fly visual system. Rather, I prefer to consider itas a computational theory of early vision, with as main result a spatiotemporal filter (i.e., prefilterin series with neural filter) that maximizes information transfer, given certain properties ofnatural image streams and certain assumptions. It aims at understanding early vision in general,as much as possible independently of how these first steps are actually physically implemented ina specific visual system.

Limitations and extensions of the present theory

One property that is not predicted by the present theory is the latency of the impulse response,and its variation with RNS . Experimental impulse responses have latencies that depend on lightintensity, with latency roughly proportional to response width (see Figs. 6 and 7). Theassumption of minimum phase yields theoretical impulse responses that have shorter latenciesthan experimental ones, although this obviously can be arbitrarily corrected by adding a delay.Short latencies appear to be more advantageous than larger ones, as relevant information is thenearlier available to the nervous system. Whether latencies are longer because of inherentlimitations of the transduction process (e.g., due to diffusion processes, or many cascadedampli fication stages), or that other contraints are involved (e.g., a response with a longer latencymay cost less metabolic energy, or produce less noise per photon absorbed) remains unclear atthe moment.

The theory presented is isotropic, i.e., it treats all spatial directions as equal and leads tocircularly symmetrical receptive fields. In fact, there are indications that LMCs are notcompletely circularly symmetrical, but display somewhat stronger lateral inhibition in certaindirections. Similar asymmetries can be readily produced by the theory by assuming a non-isotropic velocity model (e.g., horizontal velocities more common and larger than vertical ones).Also regional variations over the eye could be produced by taking into account that the velocitymodel might be different for e.g. frontal vision as compared to lateral vision. A final possibility isthat the velocity model itself may be a function of RNS , i.e., that the fly changes its flight andwalking behaviour depending on the ambient light level (for instance, on average slowing downwhen less light is available).

The theory specifies a spatiotemporal filter for each level of light adaptation, but it does notspecify how the transition from one adaptational state to another is made (the dynamics ofadaptation). This problem is not unimportant in practice, because adapting too slowly or tooquickly to changing average light levels - which can only be estimates - would deteriorate vision.A possible solution, along lines similar to the paradigm used in this article, would be to study thenatural statistics of light level variations, and use these to predict the dynamics of adaptation(requiring maximum information transfer and minimum response saturation).

Acknowledgements. I wish to thank Nomdo Jansonius and Doekele Stavenga for usefulcomments. This research was supported by the Netherlands Organization for Scientific Research(NWO).

AppendixIn the following we will assume a discrete spatiotemporal system, with Nx samples along thespatial x-axis spanning a width wx, Ny samples along the spatial y-axis spanning a width wy, andNt samples along the temporal t-axis spanning a time wt. This formulation leads to discretemathematics, which can readily be implemented in a computer program. Nx samples spanning awidth wx lead to Nx spatial frequencies fx, spaced at ∆ fx = 1/wx:

(see e.g. Bracewell 1978). Similarly, there are Ny spatial frequencies fy, spaced at ∆ fy = 1/wy, andNt temporal frequencies ft, spaced at ∆ ft = 1/wt.

The spatial power density of natural images is given by

with c1 a calibration constant, given by

and the spatial contrast cs of the image defined as

If the density of objects distributed uniformly in the environment (i.e. in three dimensions) is α, itfollows that the probabilit y P(x)dx of observing an object between a distance x and x+dx equals

which is similar to Beer's law for absorption of light in an absorbing medium. Assuming that theanimal moves with a velocity vl (perpendicularly to the direction of the object), the resultingvelocity of the object in the image is

With a change of variables (x to v) we find from Equations 5 and 6 the probabilit y av(v)dv ofobserving an object with an effective speed in the image between v and v+dv:

Thus for large v, av(v) behaves as v-2. This applies to all possible translational velocities vl of theanimal. As a reasonable guess for the total velocity distribution we then choose

with σv a positive constant regulating the width of the distribution, and cv a constant such that

The second part of Equation 9 follows from a change of variables (fx,fy,v) → (fx,fy,ft) , with ft thetemporal frequency, and from (assuming no correlation between the direction of the velocity vand the spatial frequency vector (fx,fy))

with

i.e., fr is the amplitude of the spatial frequency vector.

With the change of variables (fx,fy,v) → (fx,fy,ft) , and by approximating the contribution offrequencies lower than 1/wr by substituting fr = 1/2wr for the case fr = 0 and ft ≠ 0, we finally getfor the spatiotemporal power density

For the spatial prefiltering we use a spatial modulation transfer function ms(fx,fy):

with σs related to the half-width ∆ρ of the angular sensitivities of the photoreceptors as

For the amplitude of the temporal transfer function, |mt(ft)| we use

with σt a constant determining the temporal resolving power of the photoreceptor.

The power density of the signal in the photoreceptor after this filtering is

Next, we assume that to this signal a noise power density Ne is added, assumed to be constant inthat part of space and time where the signal is not vanishingly small. We will take this volume tobe a cube in fx-fy-ft-space with |fx| ≤ 2σs, |fy| ≤ 2σs, and |ft| ≤ 3σt. Due to the filtering by ms andmt, Srec will be negligible outside this cube. We then define the average signal-to-noise ratio

( RNS ) at the level of the photoreceptors as

where the summation is over a volume in space and time covering virtually all of the power ofthe stimulus (after prefiltering).

Both signal and noise are subsequently filtered in space and time by a neural filter with a powertransfer function pn(fx,fy,ft). Finally, the result is delivered to a channel with a limited dynamicrange, K, and a limited information capacity due to internal noise (power density Ni).

Now we have for the signal power density S and the noise power density N in the channel

The requirement that the total of signal and noise remain within the dynamic range of the channelleads to

where we have used the fact that the mean square value of a signal equals the integral over itspower spectrum (see e.g. van der Ziel 1970). Finally, we require that the information transfer rateR is as large as possible, with R defined as (e.g. Goldman 1953):

where we have chosen loge rather than log2 for mathematical convenience. This means that if wewant to express the information transfer rate in bits per second per steradian, we have to multiplythe R of Equation 21 by log2e. Thus R has to be maximized while keeping the constraint inEquation 20. Following Goldman (1953, p.159), this problem can be solved using the method ofLagrange multipliers, which leads to

with λ a Lagrange multiplier. Now we can find pn by choosing λ such that Equation 20 issatisfied. This has to be done numerically by varying λ, thus varying pn (Equation 22) andthereby S (Equation 18), N (Equation 19), and finally the summation of Equation 20.

The power transfer function |mstn|2 of the total spatiotemporal filter F transforming the originalimage is now given by

Through Equation 23 we know the power transfer function of F, and thus the amplitude of itstransfer function, |mstn|. In this article we are dealing with filters that are spatially symmetrical,thus the phase due to the spatial part of mstn is zero for all spatial frequencies (fx,fy). The phasedue to the temporal part of mstn can be obtained by assuming a minimum phase filter (e.g.Humpherys 1970). It then follows from the Hilbert transform of the logarithm of the amplitudespectrum. Applying this to the transfer function at each spatial frequency leads to a temporalimpulse response at each spatial frequency. These impulse responses are subsequently shifted bya pure time delay in such a way that the first peak of each response is at 20 ms from the timeorigin. Finally, spatial frequencies are transformed by a 2-D Fourier transform to spacecoordinates, assuming zero phase (circularly symmetrical filters).

ReferencesBarlow HB (1961) Possible principles underlying the transformations of sensory messages.In: Sensory communication (WA Rosenblith, ed). MIT Press, Cambridge Mass., pp 217-234

Barlow HB (1981) The Ferrier Lecture, 1980. Critical limiti ng factors in the design of the eyeand visual cortex. Proc R Soc Lond B 212:1-34

Barlow HB (1983) Understanding natural vision. In: Physical and biological processing ofimages (OJ Braddick and AC Sleigh, eds). Springer, Berlin Heidelberg New York, pp 2-14

Bracewell , R.N. (1978) The Fourier transform and its applications. McGraw-Hill

Burton GJ, Moorehead IR (1987) Color and spatial structure in natural scenes. Appl Opt26:157-170

Dubs A (1982) The spatial integration of signals in the retina and lamina of the fly compoundeye under different conditions of luminance. J Comp Physiol 146:321-343

Dubs A, Laughlin SB, Srinivasan MV (1981) Single photon signals in fly photoreceptors andfirst order interneurons at behavioural threshold. J Physiol 317:317-334

Field DJ (1987) Relations between the statistics of natural images and the response propertiesof cortical cells. J Opt Soc Am A 4:2379-2394

Glantz RM (1991) Motion detection and adaptation in crayfish photoreceptors. J Gen Physiol97:777-797

Goldman S (1953) Information theory. Dover Publications, New York

Götz KG (1965) Die optischen Übertragungseigenschaften der Komplexaugen vonDrosophila. Kybernetik 2:215-221

Hardie RC, Weckström M (1990) Three classes of potassium channels in large monopolarcells of the blowfly Calliphora vicina. J Comp Physiol A 167:723-736

Harris FJ (1978) On the use of windows for harmonic analysis with the discrete Fourier

transform. Proc IEEE 66:51-83

Hateren JH van (1984) Waveguide theory applied to optically measured angular sensitivitiesof f ly photoreceptors. J Comp Physiol A 154:761-771

Hateren JH van (1986) Electrical coupling of neuro-ommatidial photoreceptor cells in theblowfly. J Comp Physiol A 158:795-811

Hateren JH van (1987) Neural superposition and oscill ations in the eye of the blowfly. JComp Physiol A 161:849-855

Hateren JH van (1989) Photoreceptor optics, theory and practice. In: Facets of vision (DGStavenga and RC Hardie, eds). Springer, Berlin Heidelberg New York, pp 74-89

Hateren JH van, Laughlin SB (1990) Membrane parameters, signal transmission, and thedesign of a graded potential neuron. J Comp Physiol A 166:437-448

Hengstenberg R, Sandeman DC, Hengstenberg B (1986) Compensatory head roll i n theblowfly Calliphora during flight. Proc R Soc Lond B 227:455-482

Howard J, Dubs A, Payne R (1984) The dynamics of phototransduction in insects. J CompPhysiol A 154:707-718

Howard J, Snyder AW (1983) Transduction as a limitation on compound eye function anddesign. Proc R Soc Lond B 217:287-307

Huang J, Turcotte DL (1990) Fractal image analysis: application to the topography of Oregonand synthetic images. J Opt Soc Am A 7:1124-1130

Humpherys, D.S. (1970) The analysis, design, and synthesis of electrical filters. Prentice-Hall , Englewood Cli ffs, N.J.

Jansonius NM, Hateren JH van (1991) Fast temporal adaptation of on-off units in the firstoptic chiasm of the blowfly. J Comp Physiol A 168:631-637

Kelly DH (1979) Motion and vision. II . Stabili zed spatio-temporal threshold surface. J OptSoc Am 69:1340-1349

Kirschfeld K (1967) Die Projektion der optischen Umwelt auf das Raster der Rhabdomere imKomplexauge von Musca. Exp Brain Res 3:248-270

Land MF, Collett TS (1974) Chasing behaviour of houseflies ( Fannia canicularis). J CompPhysiol 89:331-357

Laughlin SB (1980) Neural principles in the visual system. In: Handbook of SensoryPhysiology, vol VII/6B (H Autrum, ed). Springer, Berlin Heidelberg New York, pp 133-280

Laughlin SB (1981) A simple coding procedure enhances a neuron's information capacity. ZNaturforsch 36c:910-912

Laughlin SB (1983) Matching coding to scenes to enhance eff iciency. In: Physical andbiological processing of images (OJ Braddick and AC Sleigh, eds). Springer, BerlinHeidelberg New York, pp 42-52

Laughlin SB (1987) Form and function in retinal processing. Trends in Neurosciences

10:478-483

Laughlin SB, Hardie RC (1978) Common strategies for light adaptation in the peripheralvisual systems of f ly and dragonfly. J Comp Physiol 128:319-340

Laughlin SB, Howard J, Blakeslee B (1987) Synaptic limitations to contrast coding in theretina of the blowfly Calliphora. Proc R Soc Lond B 231:437-467

Laughlin SB, Osorio D (1989) Mechanisms for neural signal enhancement in the blowflycompound eye. J exp Biol 144:113-146

Lill ywhite PG, Laughlin SB (1979) Transducer noise in a photoreceptor. Nature 277:569-572

Norwich KH, Valter McConvill e KM (1991) An informational approach to sensoryadaptation. J Comp Physiol A 168:151-157

Oppenheim AV, Lim JS (1981) The importance of phase in signals. Proc IEEE 69:529-541

Osorio D (1992) Patterns of function and evolution in the arthropod optic lobe. In: Vision andvisual dysfunction, Vol II . Evolution of the eye and visual system (Cronly-Dill on JR, GregoryR, eds). Macmillan, London

Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1989) Numerical recipes. CambridgeUniversity Press, Cambridge

Ruyter van Steveninck RR de (1986) Real-time performance of a movement-sensitive neuronin the blowfly visual system. Thesis, University of Groningen

Smakman JGJ, Hateren JH van, Stavenga DG (1984) Angular sensitivity of blowflyphotoreceptors: intracellular measurements and wave-optical predictions. J Comp Physiol A155:239-247

Snyder AW, Stavenga DG, Laughlin SB (1977) Spatial information capacity of compoundeyes. J Comp Physiol 116:183-207

Srinivasan MV, Laughlin SB, Dubs A (1982) Predictive coding: a fresh view of inhibition inthe retina. Proc R Soc Lond B 216:427-459

Srinivasan MV, Pinter RB, Osorio D (1990) Matched filtering in the visual system of the fly:large monopolar cells of the lamina are optimized to detect moving edges and blobs. Proc RSoc Lond B 240:279-293

Straka H, Ammermüller J (1991) Temporal resolving power of blowfly visual system: effectsof decamethonium and hyperpolarization on responses of laminar monopolar neurons. JComp Physiol A 168:129-139

Strausfeld NJ, Lee J-K (1991) Neuronal basis for parallel visual processing in the fly. VisualNeuroscience 7:13-33

Wagner H (1986a) Flight performance and visual control of f light of the free-flying housefly(Musca domestica L.). I. Organization of the flight motor. Phil Trans R Soc Lond B 312:527-551

Wagner H (1986b) Flight performance and visual control of f light of the free-flying housefly(Musca domestica L.). II . Pursuit of targets. Phil Trans R Soc Lond B 312:553-579

Wagner H (1986c) Flight performance and visual control of flight of the free-flying housefly(Musca domestica L.). III. Interactions between angular movement induced by wide- andsmallfield stimuli. Phil Trans R Soc Lond B 312:581-595

Ziel A van der (1970) Noise. Sources, characterization, measurement. Prentice-Hall,Englewood Cliffs, N.J.