on the effectiveness of noise masks: naturalistic vs. un-naturalistic image statistics

13
On the effectiveness of noise masks: Naturalistic vs. un-naturalistic image statistics Bruce C. Hansen a,, Robert F. Hess b a Department of Psychology & Neuroscience Program, Colgate University, Hamilton, NY 13346, USA b McGill Vision Research Unit, McGill University, Montreal, QC H3A 1A1, Canada article info Article history: Received 22 September 2011 Received in revised form 10 January 2012 Available online 31 March 2012 Keywords: Natural scenes Spatial masking Power spectrum Fractal noise Contrast gain control abstract It has been argued that the human visual system is optimized for identification of broadband objects embedded in stimuli possessing orientation averaged power spectra fall-offs that obey the 1/f b relation- ship typically observed in natural scene imagery (i.e., b = 2.0 on logarithmic axes). Here, we were inter- ested in whether individual spatial channels leading to recognition are functionally optimized for narrowband targets when masked by noise possessing naturalistic image statistics (b = 2.0). The current study therefore explores the impact of variable b noise masks on the identification of narrowband target stimuli ranging in spatial complexity, while simultaneously controlling for physical or perceived differ- ences between the masks. The results show that b = 2.0 noise masks produce the largest identification thresholds regardless of target complexity, and thus do not seem to yield functionally optimized channel processing. The differential masking effects are discussed in the context of contrast gain control. Ó 2012 Elsevier Ltd. All rights reserved. 1. Introduction Decades of research have lead to a popular view of the initial processes in human striate cortex that, in part, involves multiple sub-populations of striate neurons acting like non-linear ‘‘filters’’ (or ‘‘channels’’ in terms of psychophysical terminology). These filter-channels are argued to each extract a specific narrow band of spatial frequency and orientation content from our visual envi- ronment (e.g., Campbell & Robson, 1968; Carandini et al., 2005; De Valois, Albrecht, & Thorell, 1982; De Valois, Yund, & Hepler, 1982; Field & Tolhurst, 1986; Graham & Nachmias, 1971; Maffei & Fiorentini, 1973; Merigan & Maunsell, 1993; Pantle & Sekuler, 1968; Phillips & Wilson, 1984; Ringach, 2002; Shapley & Lennie, 1985; Wilson & Bergen, 1979). Further, numerous studies have re- ported large scale interactions between channels tuned to different spatial frequencies and orientations (e.g., Bauman & Bonds, 1991; Bonds, 1989; Bosking et al., 1997; DeAngelis et al., 1992; Fitzpatrick, 2000; Kersten, 1984; Legge & Foley, 1980; Meese & Holmes, 2010; Meier & Carandini, 2002; Morrone, Burr, & Maffei, 1982; Nelson et al., 1994; Olzak, 1985; Olzak & Thomas, 1991; Petrov, Carandini, & McKee, 2005; Ross & Speed, 1991). Based on such studies, we now know a great deal regarding how any given spatial channel interacts with others under conditions utilizing various spatial configura- tions of narrowband stimuli. However, we still know very little in terms of how such channels operate when interacting with a very broad range of spatial channels (broad in both spatial frequency and orientation). That is, previous simultaneous masking experiments (using narrowband overlay, lateral, or surround mask- ing configurations) possess limited predictive power regarding how specific channels operate when processing the real-world environ- ment (Olshausen & Field, 2005). Specifically, the natural environ- ment is known to be broadband in both spatial frequency and orientation (reviewed in Hansen, Haun, and Essock (2008)), which means that at any given location within a scene, many visual chan- nels are likely to be simultaneously active. Thus, the functional operation of a given channel will be weighted by the inter- dependent responses from a broad array of differently tuned channels (and not just a small sub-set of channels). Given the above, if one wishes to understand how spatial chan- nels may operate on a day-to-day basis, it is necessary to utilize simultaneous masking paradigms that employ masks that are broad in terms of both spatial frequency and orientation. Granted, numerous studies have utilized white noise masks in simultaneous masking paradigms designed to elucidate the response characteris- tics of spatial channels underlying the detection, discrimination, or identification of stimuli ranging from sinusoidal gratings to broad- band stimuli such as letters and human faces (e.g., Alexander, Xie, & Derlacki, 1994; Burgess et al., 1981; Gold, Bennett, & Sekuler, 1999; Henning, Hertz, & Hinton, 1981; Legge et al., 1985; Majaj et al., 2002; Oruç & Barton, 2010; Oruç & Landy, 2009; Parish & Sperling, 1991; Pelli et al., 2006; Solomon & Pelli, 1994; Tjan et al., 1995). However, those studies typically employed white noise masks with the express aim of parsing an observer’s perfor- mance from their ‘intrinsic noise’ as a ‘pure’ measure of observer ability (Pelli & Farell, 1999). Further, white noise masks possess constant contrast energy across all spatial frequencies and orienta- tions, a property that is far from the typical distribution of contrast 0042-6989/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.visres.2012.03.017 Corresponding author. E-mail address: [email protected] (B.C. Hansen). Vision Research 60 (2012) 101–113 Contents lists available at SciVerse ScienceDirect Vision Research journal homepage: www.elsevier.com/locate/visres

Upload: bruce-c-hansen

Post on 13-Sep-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Vision Research 60 (2012) 101–113

Contents lists available at SciVerse ScienceDirect

Vision Research

journal homepage: www.elsevier .com/locate /v isres

On the effectiveness of noise masks: Naturalistic vs. un-naturalistic image statistics

Bruce C. Hansen a,⇑, Robert F. Hess b

a Department of Psychology & Neuroscience Program, Colgate University, Hamilton, NY 13346, USAb McGill Vision Research Unit, McGill University, Montreal, QC H3A 1A1, Canada

a r t i c l e i n f o

Article history:Received 22 September 2011Received in revised form 10 January 2012Available online 31 March 2012

Keywords:Natural scenesSpatial maskingPower spectrumFractal noiseContrast gain control

0042-6989/$ - see front matter � 2012 Elsevier Ltd. Ahttp://dx.doi.org/10.1016/j.visres.2012.03.017

⇑ Corresponding author.E-mail address: [email protected] (B.C. Hanse

a b s t r a c t

It has been argued that the human visual system is optimized for identification of broadband objectsembedded in stimuli possessing orientation averaged power spectra fall-offs that obey the 1/fb relation-ship typically observed in natural scene imagery (i.e., b = 2.0 on logarithmic axes). Here, we were inter-ested in whether individual spatial channels leading to recognition are functionally optimized fornarrowband targets when masked by noise possessing naturalistic image statistics (b = 2.0). The currentstudy therefore explores the impact of variable b noise masks on the identification of narrowband targetstimuli ranging in spatial complexity, while simultaneously controlling for physical or perceived differ-ences between the masks. The results show that b = 2.0 noise masks produce the largest identificationthresholds regardless of target complexity, and thus do not seem to yield functionally optimized channelprocessing. The differential masking effects are discussed in the context of contrast gain control.

� 2012 Elsevier Ltd. All rights reserved.

1. Introduction

Decades of research have lead to a popular view of the initialprocesses in human striate cortex that, in part, involves multiplesub-populations of striate neurons acting like non-linear ‘‘filters’’(or ‘‘channels’’ in terms of psychophysical terminology). Thesefilter-channels are argued to each extract a specific narrow bandof spatial frequency and orientation content from our visual envi-ronment (e.g., Campbell & Robson, 1968; Carandini et al., 2005;De Valois, Albrecht, & Thorell, 1982; De Valois, Yund, & Hepler,1982; Field & Tolhurst, 1986; Graham & Nachmias, 1971; Maffei &Fiorentini, 1973; Merigan & Maunsell, 1993; Pantle & Sekuler,1968; Phillips & Wilson, 1984; Ringach, 2002; Shapley & Lennie,1985; Wilson & Bergen, 1979). Further, numerous studies have re-ported large scale interactions between channels tuned to differentspatial frequencies and orientations (e.g., Bauman & Bonds, 1991;Bonds, 1989; Bosking et al., 1997; DeAngelis et al., 1992; Fitzpatrick,2000; Kersten, 1984; Legge & Foley, 1980; Meese & Holmes, 2010;Meier & Carandini, 2002; Morrone, Burr, & Maffei, 1982; Nelsonet al., 1994; Olzak, 1985; Olzak & Thomas, 1991; Petrov, Carandini,& McKee, 2005; Ross & Speed, 1991). Based on such studies, we nowknow a great deal regarding how any given spatial channel interactswith others under conditions utilizing various spatial configura-tions of narrowband stimuli. However, we still know very little interms of how such channels operate when interacting with a verybroad range of spatial channels (broad in both spatial frequencyand orientation). That is, previous simultaneous masking

ll rights reserved.

n).

experiments (using narrowband overlay, lateral, or surround mask-ing configurations) possess limited predictive power regarding howspecific channels operate when processing the real-world environ-ment (Olshausen & Field, 2005). Specifically, the natural environ-ment is known to be broadband in both spatial frequency andorientation (reviewed in Hansen, Haun, and Essock (2008)), whichmeans that at any given location within a scene, many visual chan-nels are likely to be simultaneously active. Thus, the functionaloperation of a given channel will be weighted by the inter-dependent responses from a broad array of differently tunedchannels (and not just a small sub-set of channels).

Given the above, if one wishes to understand how spatial chan-nels may operate on a day-to-day basis, it is necessary to utilizesimultaneous masking paradigms that employ masks that arebroad in terms of both spatial frequency and orientation. Granted,numerous studies have utilized white noise masks in simultaneousmasking paradigms designed to elucidate the response characteris-tics of spatial channels underlying the detection, discrimination, oridentification of stimuli ranging from sinusoidal gratings to broad-band stimuli such as letters and human faces (e.g., Alexander, Xie,& Derlacki, 1994; Burgess et al., 1981; Gold, Bennett, & Sekuler,1999; Henning, Hertz, & Hinton, 1981; Legge et al., 1985; Majajet al., 2002; Oruç & Barton, 2010; Oruç & Landy, 2009; Parish &Sperling, 1991; Pelli et al., 2006; Solomon & Pelli, 1994; Tjanet al., 1995). However, those studies typically employed whitenoise masks with the express aim of parsing an observer’s perfor-mance from their ‘intrinsic noise’ as a ‘pure’ measure of observerability (Pelli & Farell, 1999). Further, white noise masks possessconstant contrast energy across all spatial frequencies and orienta-tions, a property that is far from the typical distribution of contrast

102 B.C. Hansen, R.F. Hess / Vision Research 60 (2012) 101–113

across spatial frequency in the natural environment. Particularly,the 2nd order luminance statistics of natural scene imagery havebeen extensively studied and shown to possess a property wherethe contrast (or power in the Fourier domain) at different spatialfrequencies (averaged across orientation), f, falls with increasingf, following a 1/fb relationship (e.g., Billock, 2000; Field, 1987; Field& Brady, 1997; Hansen & Essock, 2005; Kretzmer, 1952; Oliva &Torralba, 2001; Ruderman & Bialek, 1994; Simoncelli & Olshausen,2001; Tolhurst, Tadmor, & Tang, 1992; Torralba & Oliva, 2003; vander Schaaf & van Hateren, 1996), with b typically observed to benear 2.0 on logarithmic axes, or equivalently, an a exponent of 1.0if assessing the amplitude spectrum – the square-root of the powerspectrum (Billock, 2000; Burton & Moorhead, 1987; Field, 1987,1993; Field & Brady, 1997; Hansen & Essock, 2005; Ruderman &Bialek, 1994; Thomson & Foster, 1997; Tolhurst, Tadmor, & Tang,1992; van der Schaaf & van Hateren, 1996). What this means, rel-ative to white noise (i.e., b = 0.0), is that stimuli with b exponentsnear 2.0 possess more contrast at lower spatial frequencies andless at higher spatial frequencies. To better understand how spatialchannels operate when processing our broadband environments, ittherefore seems logical to not only incorporate broadband masksinto simultaneous masking paradigms, but also to use broadbandmasks with power spectra bs near 2.0.

Interestingly, it has been suggested that there exists a corre-spondence between the prevalence of content at particular spatialscales in natural scenes (i.e., the 1/fb relationship) and the shapeand scale of human spatial filters, with those filters being wellmatched to optimally code the natural world (Brady & Field,1995; Field, 1987; Simoncelli & Olshausen, 2001). Several lines ofpsychophysical research have explored the extent to which spatialchannels are optimized to process natural and naturalistic stimuliacross a broad array of tasks. Relevant to the current study are thetasks that involved the identification of image content withinscenes where the b exponents were varied. Such studies have con-sistently shown that humans are best at identification when theimages possess b exponents near 2.0, and worst at smaller or largerbs (Párraga, Troscianko, & Tolhurst, 2000, 2005; Tolhurst & Tadmor,2000). That is, our visual systems seem to best process structuralchanges between objects when the luminance statistics of thescenes within which the objects are embedded closely match thosetypically observed on a day-to-day basis. However, since the targetcontent in those studies was broadband, it is difficult to identifyhow any one spatial channel was influenced by the simultaneousactivation of other differently tuned channels under naturalistic(b = 2.0) or non-naturalistic (bs much smaller or larger than 2.0)stimulation. Additionally, since the target content was supra-threshold, it remains unclear whether such b = 2.0 tuning for iden-tification would be present when identification is limited bydetection.

Motivated by the above, the current study sought to explore theeffectiveness of different b noise masks to interfere with the iden-tification of variable contrast narrowband targets in a simulta-neous noise masking paradigm. We chose noise masks for twoprimary reasons. First, the form of the power spectrum can be pre-cisely controlled and second the detection thresholds for bandpasstargets embedded in natural scenes have been shown (Bex, Solo-mon, & Dakin, 2009; Hansen & Essock, 2005) to largely dependon edge density (noise imagery lacks the presence of broadbandedges). Given the vast array of possible targets and tasks, we chosethose that have been typically employed in simultaneous noisemasking paradigms. Specifically, we measured the identificationof narrowband targets that varied in terms of spatial complexity,ranging from simple (i.e., Gabor patterns) to complex (i.e., band-pass filtered letters) as a function of target contrast embedded infixed high contrast noise set to one of three different bs (namely,0.0, 2.0, or 3.0). The current study therefore asks: are human spatial

channels optimized to process various target stimuli to identifica-tion when presented against naturalistic backgrounds (i.e., whennoise b = 2.0)?

It is important to note that in order to effectively test whetherspatial channels are optimized in the presence of one set of 2nd-order luminance statistics compared to others, it is essential tocontrol for any physical or perceived differences between the dif-ferent noise masks. Accordingly, each experiment in the currentstudy was designed to systematically control for possible low-level accounts related to differences in noise spectral density,limitations due to available stimulus information, and thoughunlikely, differences in perceived contrast. Given the need to elim-inate multiple confounds, the current study employed narrow-band stimuli fixed to one central spatial frequency. It is alsoimportant to note that none of the noise patterns employed inthe current study possess the typical orientation biases knownto occur in natural scenes, nor do they possess any of the high-er-order statistical relationships carried by the phase spectra ofnatural scenes. Therefore, all implications for real-world percep-tion drawn from the current study should be considered withthose caveats in mind.

The design of the current study is as follows: Experiment 1 usedGabor targets and was designed to test whether the correspondingspatial channel showed evidence of optimization for those targetswhen embedded in variable b noise masks. Experiment 1 also uti-lized notch filtering in order to control for physical differences incontrast at and near the central spatial frequency of the targetstimuli. Additionally, we employed an ideal observer analysis tofactor out task constraints due to the different noise masks. Exper-iment 2 was designed to demonstrate whether perceived contrastvaries with the type of noise, and then to control for any differ-ences in perceived contrast to factor it out as a possibleexplanation for the threshold elevation differences observed inExperiment 1. Experiment 3 sought to repeat Experiments 1 and2, but with filtered letter stimuli to extend the findings of Experi-ment 1 and 2 to more spatially complex targets (i.e., letters).

The results from Experiments 1–3 do not support the notion ofoptimized channel processing for masks possessing bs set at 2.0. Infact they show the complete opposite – noise masks with a b valueof 2.0 interfere with target identification (Gabors and letters) muchmore than b = 0.0 and b = 3.0 noise. Lastly, the higher thresholds forb = 2.0 masks could not be explained by physical or perceived dif-ferences between the noise masks.

2. General method

2.1. Apparatus

All stimuli were presented on a 2100 Viewsonic (G225fB) moni-tor driven by a dual core Intel� Xeon� processor (1.60 GHz � 2)equipped with 4 GB RAM and a 256 MB PCIe � 16 ATI FireGLV7200 dual DVI/VGA graphics card with 8-bit grayscale resolution.The color management settings for the graphics card (i.e., 3D dis-play settings) were adjusted such that the luminance ‘‘gain’’ ofthe green gun was twice that of the red gun, which was set to twicethat of the blue gun. A bit-stealing algorithm (Bex, Mareschal, &Dakin, 2007; Tyler, 1997) was employed to yield 10.8 bits of lumi-nance (i.e., grayscale) resolution (i.e., 1785 unique levels) distrib-uted evenly across a 0–255 scale. Stimuli were displayed using alinearized look-up table, generated by calibrating with a Color-Vision Spyder3 Pro sensor. Maximum luminance output of the dis-play monitor was 100 cd/m�2, the frame rate was set to 85 Hz, andthe resolution was set to 1600 � 1200 pixels. Single pixels sub-tended .0134� of visual angle (i.e., 0.80 arc min.) as viewed from1.0 m. Head position was maintained with an Applied ScienceLaboratories (ASL) chin and forehead rest.

B.C. Hansen, R.F. Hess / Vision Research 60 (2012) 101–113 103

2.2. Participants

Unless stated otherwise, six human observers (five naïve to thepurpose of the study) participated in all experiments reportedhere. All had normal (or corrected to normal) vision and were com-pensated for their participation. Their ages ranged between 20 and35 years (median = 21). Institutional Review Board-approved in-formed consent was obtained.

2.3. Stimulus construction

2.3.1. Noise masksAll noise masks were constructed in the Fourier domain using

MATLAB (version R2008a). Each mask was created by constructinga single 512 � 512 polar matrix for the amplitude spectrum (i.e., atemplate spectrum), ISOAMP(f,h), and assigning all coordinates thesame arbitrary amplitude coefficient (except at the location ofthe DC component which was assigned a zero); f and h representspatial frequency and orientation respectively. The result is a flatisotropic broadband spectrum (i.e., b = 0.0). In this form, the expo-nent of the template spectrum could be adjusted by multiplyingeach spatial frequency’s amplitude coefficient by f�a, with a repre-senting the exponent for the amplitude spectrum. For the currentstudy, a was set to either 0.0, 1.0, or 1.5. These values correspondto 0.0, 2.0, and 3.0 respectively for the b exponent of the powerspectrum. From this point forward, we will only refer to the expo-nent of the power spectrum. The phase spectra (i.e., a differentphase spectrum was used for each mask) were constructed byassigning random values from �p to p to the different coordinatesof a 512 � 512 polar matrix, URAND(f,h), such that the phase spectrawere odd-symmetric – i.e., for h angles in the [p,2p] half of polarspace, URAND(f,h) = URAND(f,h) � (�1). The noise patterns were ren-dered in the spatial domain by taking the inverse Fourier transformof ISOAMP(f,h) and a given URAND(f,h), with both shifted to Cartesiancoordinates prior to the inverse DFT. The rms contrast of all noisemasks was fixed at 0.15, with rms contrast defined as the standarddeviation of all pixel luminance values within a given noise pat-tern, divided by the mean pixel luminance (the mean luminanceof all noise patterns was fixed at 127). The spectral density of thenoise masks was: 0.152 � 0.1342 = 4.04 � 10�6 deg2. Fixed rmscontrast in the spatial domain was achieved by scaling the powerspectrum in the Fourier domain. Note that the entire power spec-trum of the noise patterns was well within the CSF of the observ-ers, as can be verified by examining the parameters given inSection 2.1. Example noise masks are shown in Fig. 1a.

2.3.2. Gabor target stimuliConventional Gabor patterns were generated in the spatial do-

main according to the following operation:

Gðxi; yjÞ ¼ e�½ðxi � xpeakÞ2�

rx2 e�½ðyj � ypeakÞ

2�ry2

� cos 2pðxi � xpeakÞ

T

� �ð1Þ

where xi and yj represent spatial coordinates in Cartesian space (animage with dimensions 512 � 512), xpeak and ypeak took on the cen-ter value of the image space (i.e., 256), with rx set to 512/27 and ry

set to 512/18, giving the Gabor an elongated aspect ratio of 1:1.5(minimum radius = 44 pixels, maximum radius = 66 pixels). T wasset to 24 (period in cycles per picture). Four Gabor patterns werecreated (vertical, 45� oblique, horizontal, and 135� oblique), all setto have a bandwidth of 1-octave (full-width at half-height), witha peak spatial frequency of 3.05 cycles per degree (cpd). All Gaborswere then scaled to have a zero mean, and then normalized to 1.0.This allowed for the use of a scalar to control the contrast of G(x,y).

Gabor stimuli were embedded in the noise masks by addingG(x,y) to the pixel values of the masks (e.g., Majaj et al., 2002;Solomon & Pelli, 1994). The contrast of the Gabor targets was var-ied across 13 levels, controlled by multiplying G(x,y) by differentscalars. The contrast energy of the Gabor targets was defined inthe spatial domain as the Michelson contrast of scaled G(x,y)multiplied by the area (in degrees visual angle) of the Gabor pat-tern itself (an ellipse with 1:1.5 aspect ratio), i.e., scaled Michelsoncontrast multiplied by [p(66 � .0134)(44 � .0134)].

2.3.3. Letter target stimuliLetter stimuli consisted of all letters from the English alphabet

in Sloan ‘‘font’’, publicly available on Denis Pelli’s http://www.psych.nyu.edu/pelli/software.html. All letters were con-verted to Tagged Image Format in 72 point, and thus measured72 � 72 pixels, and subtended 0.96� of visual angle viewed from1 m. All letter targets were filtered in the Fourier domain with abandpass Gaussian function centered on a fixed spatial frequencyand extended across all orientations. Formally, the filter was cre-ated according to the following formulation (in polar coordinates):

GFðfi; hÞ ¼ e�½ðfc � fiÞ2�

rf 2 ð2Þ

where fi and h represent polar coordinates of spatial frequency andorientation respectively (here, h is without a subscript because theentire operation is applied to all orientations within the spatial fre-quency bandpass of the filter), GF represents a given bandpass filter,fc is the central spatial frequency of the filter, with rf controllingthe bandwidth of the filter (which was set to yield 1-octave, full-width at half-height). fc was set to yield letters with a peak spatial fre-quency of 3.1 cpd (i.e., 3.0 cycles per letter). Each letter was embed-ded in a given noise mask by adding it to the pixel values of thecentral portion (i.e., the central 220–292 by 220–292 pixel region)of a given noise pattern. Following from Majaj et al. (2002), lettercontrast was defined by the Weber contrast ratio (i.e., DL/Lbackground).Letter contrast energy was defined as the product of squared lettercontrast and average ink area (Majaj et al., 2002; Talgar, Pelli, &Carrasco, 2004). Letter contrast was varied across 13 levels.

3. Experiment 1

The current experiment measured the ability of human partici-pants to accurately identify the orientation of variable contrast Ga-bor targets (with peak spatial frequency set to 3.05 cpd)simultaneously masked by different types of broadband noisemasks. While the global rms contrast of all noise masks is equaland fixed within the visible spatial frequency range, the magnitudeof rms contrast at and near 3.05 cpd will not be equivalent acrossthe three different noise masks. To measure and illustrate this dif-ference, we generated a noise mask for each b level used here, andsubjected each one to a discrete Fourier transform (DFT). We thenaveraged the power across orientation for each spatial frequency(see Hansen and Hess (2006) for further details) and plotted themtogether in Fig. 1b. For those who prefer to think in terms of spec-tral density, we also measured band-limited spectral density forthe same noise patterns at eight different central frequencies. Thiswas done by (1) filtering each noise pattern in the Fourier domainwith Gaussian bandpass filters (using Eq. (2) and associatedparameters), (2) inverse transforming back to the spatial domain,and (3) calculating spectral density for each bandpass filtered im-age (results plotted in Fig. 1c). Considering the power spectra, notethat when compared to b = 0.0, there is more power (i.e., rms con-trast) within an octave band centered on the central spatial fre-quency of the Gabor targets for b = 2.0 (M = 1.25 more log units),with b = 3.0 noise being intermediate (M = 0.61 more log units),

Fig. 1. (a) Example noise masks with b increasing from left to right. (b) Orientation averaged power spectrum plots for noise masks set to b = 0.0, 2.0, or 3.0. On the ordinate isaveraged log power, and on the abscissa is log spatial frequency in cycles per degree. (c) Band-limited spectral density. On the ordinate is log spectral density of the samenoise patterns analyzed in (b), plotted as a function of the peak spatial frequency of the bandpass filtered noise (see text for further details). (d) Global spectral density plotsfor each of the noise mask b values used in Experiment 1 as a function of notch filter bandwidth (full-width). Note that, excluding the 0 notch case, noise with b = 2.0 has thelowest spectral density. Also note that the global spectral density of all notched b noise decreases with increasing notch bandwidth, the decrease in b = 0.0 noise is harder tosee here due to the figure’s scale.

104 B.C. Hansen, R.F. Hess / Vision Research 60 (2012) 101–113

with similar differences obtained for spectral density. Thus, anyobserved differences in identification threshold across the threemask types could be explained by differences in physical contrastof the masks at and near the peak spatial frequency of the targets.In order to bypass this potential confound, the current experimentemployed a standard notch filtering procedure to systematicallyremove increasing amounts of noise contrast centered on the peakspatial frequency of Gabor targets.

Another possible confound that will be addressed in the currentexperiment has to do with limitations imposed by the task itself.Specifically, it may be possible that the different types of noisediffer in their ability to obscure effective target information inthe spatial domain. That is, when a given target is combined witha given mask in the spatial domain, some masks may sum with thetarget in such a way that the target is rendered less informativethan when embedded in other types of noise. Such a scenariowould suggest that Gabor orientation identification may be limitedby the physics of the stimuli as opposed to differential responses ofspatial frequency channels in the visual system. We therefore sub-jected the task to an ideal observer analysis and evaluated thebehavioral data in that context.

3.1. Method

3.1.1. Stimulus construction3.1.1.1. Noise masks. All mask stimuli were subjected to notch fil-tering prior to the experiment. The notch filter was ideal, andwas applied while the power spectra were still in polar coordinatesin the Fourier domain. The ideal notch filtering process was appliedusing the following operation:

Fpowðfi; hjÞ ¼0 fL � fi � fH

ISOpowðfi; hjÞ elsewhere

�ð3Þ

where fi and hj represent polar coordinates of spatial frequency andorientation respectively, Fpow is the notch filtered isotropic power

spectrum, ISOPOW is the isotropic power spectrum generated accord-ing to the methodology described in Section 2, with fL and fH repre-senting the cut-off frequencies of the notch filter in cycles perpicture (cpp). Within each set of noise masks, 9 different sub-setsof noise masks were created, with each sub-set possessing notchesof a different bandwidth. Note that all notches were centered on thepeak spatial frequency of the Gabor targets, and that global rms wasallowed to fall with increasing notch bandwidth. The nine differentfull-width notch bandwidths were as follows (expressed in oc-taves): 0.0 (i.e., no notch), 0.47, 0.70, 1.0, 1.32, 1.68, 2.08, 2.32,and 2.88.

3.1.2. Psychophysical procedureThe design of the current experiment utilized simultaneous

masking and consisted of a single-interval four alternativeforced-choice method of constant stimuli paradigm. The task wasGabor orientation identification. There were nine different notchconditions for each of the three b noise masks, resulting in a totalof 27 conditions, plus one baseline condition that consisted of pre-senting Gabor stimuli of varying contrast energy against a blank(i.e., mean luminance) background. For any given trial, participantswere first presented with a fixation cross, ‘‘+’’, at the center of thedisplay screen for 500 ms. Participants were told that the crossserved to (1) signal that a Gabor would appear shortly, and (2) thatthe location of the cross would correspond to the location of theGabor (center of the display screen) they would be asked to iden-tify in terms of its orientation (vertical, 45� oblique, horizontal, or135� oblique). Following the fixation cross, a Gabor (either alone,or embedded in noise) appeared for 250 ms. In order to furtherreduce uncertainty, all stimulus intervals were signaled to the par-ticipant by a short sinusoidal tone. The stimulus interval was thenfollowed by an empty display screen (set to mean luminance) for500 ms, followed by a response screen that contained four sepa-rate, laterally displaced, lines (one at each of the possible targetorientations). Using the mouse, observers were asked to click on

B.C. Hansen, R.F. Hess / Vision Research 60 (2012) 101–113 105

the line with the same orientation as the previously observed tar-get. The duration of the response interval was un-limited. The ori-entation of a given Gabor target (1 of 4 possible) was determinedrandomly (from a uniform distribution), and all participants wereinformed that each target orientation would be equally likely.The noise masks subtended 6.89� of visual angle, with the embed-ded Gabor targets subtending 1.18� � 1.77� degrees of visual angle.All viewing was binocular. The experiment was run using MATLAB(version R2008a) together with the Psychophysics Toolbox (Brai-nard, 1997). Within each one of the conditions, each contrast levelof the Gabor target was repeated 12 times, giving 156 trials percondition (i.e., trials were blocked by notch bandwidth and noiseb value) with 4368 trials total for each participant (including base-line). All conditions were randomly interleaved. Participantstended to complete �4 conditions (�1 h) per day, and completedthe entire experiment between 5 and 6 h. All participants were al-lowed practice trials for each condition until they felt comfortablewith the task.

3.1.3. Results and discussionThreshold estimates were derived from all psychometric func-

tions with ‘‘psignifit’’ (Wichmann & Hill, 2001a, 2001b), using theweibull fit option. Since performance did not differ significantlyacross the six participants, only the averaged threshold data areshown (see Fig. 2). The averaged threshold for the blank back-ground condition is shown to the far left of Fig. 2 for reference.The threshold data were analyzed with a 3 (noise mask b) � 9(notch bandwidth) two-way repeated measures analysis of vari-ance (ANOVA) using a reasonably conservative correction (i.e.,Huynh–Feldt epsilon) to adjust the degrees of freedom. The maineffect of noise mask b was significant (F(2,10) = 179.21, p < .001),as was the main effect of octave bandwidth (F(8,40) = 63.1,p < .001). The interaction was also found to be statistically signifi-cant (F(12,62) = 13.36, p < .001). There was also a significant lineartrend for the main effect of octave bandwidth (F(1,5) = 280.26,p < .001). Post hoc two-way ANOVAs revealed that each b condition(collapsed across notch bandwidth) differs significantly from eachother (p < .01). While all threshold curves in Fig. 2 show a statisti-cally significant linear decline as notch bandwidth increases, posthoc paired t-tests show that only the b = 2.0 noise mask thresholdsare all significantly higher (p < .01) than the blank background con-dition. For the b = 0.0 noise mask notch conditions, the notchthresholds cease to differ significantly from the blank background

Fig. 2. Data from Experiment 1. On the ordinate is log averaged threshold contrastenergy (error bars are ± 1 SE, between participants), and on the abscissa is notchfilter bandwidth (full-width at half-height). The light gray empty square showsaveraged threshold contrast energy for Gabor orientation identification on a blankbackground (error bars are ±1 SE, between participants).

condition at notch bandwidths larger than 1.68 octaves (p > 0.5),while the b = 3.0 notch thresholds stop differing significantly fromthe blank background condition at notch bandwidths larger than2.08 (p > 0.5).

Fig. 2 shows that while all noise masks interfered with targetidentification for the majority of notch bandwidths, b = 2.0 noisemasks interfered the most with Gabor orientation identificationat all notch bandwidths examined here. On average, b = 2.0 noisemasks required 2.47 times more contrast energy at threshold thanb = 0.0 noise masks (ranging from 3.11 no-notch to 1.73 at 2.88 oc-tave notch), and 1.93 times more contrast energy than b = 3.0 noisemasks (ranging from 1.95 no-notch to 1.43 at 2.88 octave notch). Ittherefore appears as though human spatial channels near the peakof the human CSF are not optimized to process simple narrowbandstimuli to identification when embedded in naturalistic back-grounds. Interestingly, the threshold trends reported here are mostconsistent with the perceived contrast work of McDonald andTadmor (2006), which shows more suppression of perceived con-trast of broadband targets (natural scenes or noise) surroundedby broadband images made to possess b exponents near 2.0. Wewill return to the issue of perceived contrast in Experiment 2. Fur-ther, whilst the aim of the current experiment was not to estimatechannel bandwidth as a function of noise b, the results of the cur-rent analysis also seem to suggest that different noise masks resultin different estimates of channel bandwidth (for a channel cen-tered on 3.05 cpd), with b = 2.0 noise yielding a bandwidth >2.88.Taken to the extreme, this would suggest that the typical 1–1.5octave bandwidth estimate of channel bandwidth (e.g., Campbell& Robson, 1968; De Valois et al., 1982; Legge, 1978; Mostafavi &Sakrison, 1976; Sachs, Nachmias, & Robson, 1971; Stromeyer &Julesz, 1972; Wilson, McFarlane, & Phillips, 1983) using isolatedGabor/grating stimuli (or estimates derived from white noisemasking) is largely dependent on the choice of masking stimulus.That is, since b = 2.0 is the typical power spectrum slope encoun-tered in the real-world, one might expect to find much broaderestimates of channel bandwidth when examined in that context.We return to this issue in the Section 6.

We now address the issue of the disproportionate amount ofphysical contrast in the masks at and near the target central spatialfrequency. Introducing notches centered on the peak spatial fre-quency of the Gabor targets of course resulted in significant reduc-tions in contrast energy threshold compared to the no-notchcondition. And, while the significant interaction reported abovesuggests that the contrast energy thresholds did not decrease pro-portionally, the general trend of b = 2.0 noise masking most, fol-lowed by b = 3.0 noise, was verified statistically via the maineffect of noise mask b in the post hoc ANOVAs reported above.However, additional post hoc paired t-tests show that notch band-width did eliminate some of the differential masking effects, butonly for thresholds between the b = 0.0 and b = 3.0 noise mask con-ditions at notch bandwidths larger than 1.68 (p > .05). That is,b = 2.0 noise mask thresholds were always significantly higherthan the other two b noise masks, regardless of notch bandwidth.Since the largest notch bandwidth was almost three octaves, thisrules out the possibility that the highest thresholds observed withb = 2.0 noise masks here can be explained by differences in rmscontrast at and near the central spatial frequency of the Gabor tar-gets. Further, when the global spectral density of the differentlynotched noise masks is measured, the b = 2.0 masks actually de-crease the most (with the same being true for the power spectrumof course) as a function of notch bandwidth (refer to Fig. 1d), yetthose are the masks which produce the largest thresholds. Addi-tionally, a more conservative approach would be to only considerthe noise contrast in 1-octave bands flanking the notch in thebroadest notch condition as those still contain slightly more con-trast for b = 2.0 masks. Specifically, when the power in those bands

Fig. 3. (a) Ideal observer analysis results reported in Experiment 1. On the ordinateis log averaged threshold contrast energy for the ideal observer, and on the abscissais notch filter bandwidth (full-width at half-height). (b) Log high noise efficiencycalculated from the data shown in Fig. 2. See text for further details.

106 B.C. Hansen, R.F. Hess / Vision Research 60 (2012) 101–113

is summed, b = 2.0 masks yield 12.82 log units of power; b = 3.0masks yield 12.62 log units of power, and b = 0.0 masks yield12.05 log units of power. However, when considering that the dif-ference in threshold elevation in the broadest notch condition be-tween b = 2.0 and b = 0.0 is still quite large (almost a factor of twodifference in contrast energy needed to identify Gabor orientation),it seems very unlikely that such a difference was produced by a0.77 log unit difference in contrast. Furthermore, and crucially,we did not observe a significant difference between threshold con-trast energy between the b = 3.0 and b = 0.0 masks in the broadestnotch conditions, yet the difference in mask contrast (i.e., power)in the 1-octave bands flanking the broadest notch was almost aslarge as the difference between b = 2.0 and b = 0.0 masks (i.e.,0.581 log units). Taken together, it seems very unlikely that differ-ences in either global power as a function of notch bandwidth, ordifferences in local power flanking the broadest notch can explainthe larger masking effects produced by b = 2.0 noise masks.

Lastly, we turn to the possibility that the results reported inFig. 2 may be explained by noise masks that differ in their abilityto obscure effective information of the targets in the spatial do-main. In order to explore this possibility, we subjected the currentexperiment to an ideal observer analysis. The design of the idealobserver and subsequent analysis are described in the followingsection.

3.1.4. Ideal observer analysisWe constructed a Bayesian ideal observer that made task deci-

sions according to the standard formulation [e.g., p(s|I) = p(I|s)p(s)/p(I)]. Here, the ‘evidence’, p(I), is assumed constant (which is stan-dard practice). Since the orientation of the Gabor elements waspresented randomly according to a uniform distribution (partici-pants were always made aware that each orientation was equallylikely), the ‘prior’, p(s), within the confines of the current experi-ment, is also assumed to be constant. This leaves the posteriorprobability being determined by the ‘likelihood’, p(I|s). Our idealobserver’s decision rule is therefore based on Maximum LikelihoodEstimates (MLEs). The MLEs were based on the outputs of a multi-channel filter model, with its responses analyzed in a manner sim-ilar to that described in Solomon and Pelli (1994). Each filter wascreated using the operation defined in Eq. (2) and parameters listedin Section 2. Here,fc was varied to create six filters, with a peak-to-peak separation of �0.5 octave. The peak spatial frequency of eachof the filters (in cpd) was 0.96, 1.41, 2.11, 2.95, 4.36, 5.91, and 8.58– the entire range of filters therefore covers over 3 octaves worth ofspatial content centered on the peak spatial frequency of the Gabortargets (i.e., covers the vast majority of the Gaussian profile of thetargets in the Fourier domain). The reason for using a multi-chan-nel system to filter the stimuli (as opposed to a single filter cen-tered on the peak frequency of the Gabor targets) is because thespectral density of the different noise masks varies from lower tohigher spatial frequencies on either side of the central spatial fre-quency of the targets – thus, the most informative channel maynot always be the one centered on the target’s peak spatial fre-quency (c.f. Solomon, 2000). MLEs were determined by the follow-ing process. A Gabor target (at one of the four orientations and oneof the 13 contrast levels) was embedded in a given noise image.That image was then Fourier transformed and shifted into polarcoordinates and the power spectrum was filtered with one of the6 GF filters (this was repeated for each fc and an original copy ofthe power spectrum). Each differently filtered power spectrumwas then shifted back to Cartesian coordinates and inverse trans-formed (along with the original phase spectrum) into the spatialdomain and represented the image processed by one of 6 spatialfrequency channels. Next, the least squared difference (LSD) be-tween each of the four Gabor targets (alone, i.e., not embedded innoise) and each different channel output was taken (e.g., Solomon

& Pelli, 1994). The ideal observer first selected the channel(s) withthe smallest LSD, and then selected the smallest LSD across thefour possible targets as the target with the highest posterior prob-ability (again, the prior was constant and therefore contributednothing to the decision). The ideal observer was then run throughall notch filter conditions. The results of the ideal observer analysisare shown in Fig. 3a.

The results produced by the ideal observer show a similar trendcompared to the human data for the same task in that b = 2.0 noisemasks produce the highest thresholds, followed by b = 3.0 noisemasks. So, it appears as though some of the differential masking ef-fects in the human data can be explained by task constraints them-selves. However, the critical test is to transform the human datainto efficiencies. One of the benefits of calculating efficiency froman ideal observer analysis is that it allows the experimenter to‘‘. . .strip away the intrinsic difficulty of the task to reveal a puremeasure of human ability.’’ (Pelli & Farell, 1999, JOSA A, p. 647).We therefore calculated high-noise efficiency as defined by Pelliand Farell (1999) – i.e., Eideal/(E � E0), where Eideal represents thecontrast energy thresholds observed with the ideal observer forGabors in noise, E represents the contrast energy thresholdsobtained from humans for Gabors in noise, and E0 represents thecontrast energy thresholds obtained from humans for Gaborsalone. The results are plotted in Fig. 3b. With the exception ofnotch bandwidth 2.08, b = 2.0 noise masks produce the lowest

B.C. Hansen, R.F. Hess / Vision Research 60 (2012) 101–113 107

efficiencies, followed mostly by b = 3.0 noise masks, which mirrorsthe trends in the threshold data shown in Fig. 2. Thus, lowest effi-ciencies for b = 2.0 noise in the context of the current experimentsuggest humans ‘struggled’ most with b = 2.0 noise compared tothe other noise types, and since efficiency factors out the physicalconstraints of the stimuli, the results reported in Fig. 2 cannot beexplained by the physical characteristics of the stimuli.

4. Experiment 2

Previous studies (e.g., Cass et al., 2009; Field & Brady, 1997;Tolhurst & Tadmor, 2000) have noted a subjective difference incontrast between images varying in b. Indeed, Baker and Graf(2009) have shown that humans tend to perceive b = 2.0 noise ashaving the highest perceived contrast (compared to noise withsmaller or larger b values) when adjusted to match a 3 cpd grating.Further, McDonald and Tadmor (2006) have shown that, comparedto other bs, b = 2.0 surrounds suppress the perceived contrast ofbroadband center stimuli most. It may therefore be possible thatthe results of Experiment 1 can be explained by perceived contrastof the masks. Granted, while there is a strong consensus that per-ceived contrast plays only a meager role in threshold elevation insimultaneous masking paradigms (e.g., Georgeson & Sullivan,1975; Hess, 1990; Pelli, 1979), we thought it may still be worthexploring as a possible confound since 2/3 of the mask stimuli em-ployed in the current study are quite different from traditionalmasks. In order to test for effects of perceived contrast, the currentexperiment was designed to have two parts. The first part is de-voted to measuring the perceived contrast of spatial noise as b isvaried using a suprathreshold matching paradigm. The second partconsists of repeating portions of Experiment 1, but with noisemasks equated in terms of perceived contrast based on the resultsof the first part of the current experiment. It is worth noting thatwhile part I of the current experiment was used to inform part II,it also provides a novel data set regarding perceived contrast of dif-ferent types of noise patterns relative to each other, which is some-thing that is lacking in the literature.

4.1. Experiment 2: Part I (suprathreshold noise contrast matching)

4.1.1. ParticipantsTwenty-one observers (all naïve to the purpose of the experi-

ment) participated here. All had normal (or corrected to normal)vision and were compensated for their participation. Their agesranged between 18 and 21 years. Institutional Review Board-ap-proved informed consent was obtained.

4.1.2. Psychophysical procedureThe noise patterns used here had bs ranging from 0.0 to 4.0 (in

steps of 1.0) and were fit with an edge-ramped circular window inthe spatial domain (e.g., Hansen & Hess, 2006). The procedure wasmethod of adjustment and involved presenting participants with a‘standard’ noise pattern on either the left or right side of a CRTmonitor, and a ‘test’ noise pattern on the opposite side (the outeredge of both patterns was separated by a 3.87�). The rms contrastof the standard noise patterns was fixed at 0.18, and the initial rmscontrast of the test noise patterns was set to a random value se-lected from a uniform distribution ranging from 0.0 to 0.36.Observers were given a prompt to indicate which side of the mon-itor the standard pattern would be located. Observers were in-structed to adjust (using the up, down, left, and right arrow keyson the number pad of a standard PC keyboard) the global contrastof the test pattern to match that of the standard pattern. Allobservers were specifically instructed not to match local regionsof brightness between the two patterns. Observers were also

instructed to ignore the differences between the patterns’ textureand to attend specifically to the global contrast (i.e., compare thedifferences between the brighter and darker regions within the testnoise patterns to the differences in the standard noise). The stepsizes for adjusting the rms contrast of the test noise patterns werefixed, with the up-down arrow keys making larger rms steps (i.e.,±0.008), and the left–right arrow keys making smaller steps (i.e.,±0.001). Participants were instructed to use the up-down arrowkeys to bring the test pattern to a contrast that was close to thestandard, and then use the left–right arrow keys to fine-tune theirmatch. The use of up-down and left–right keys was tracked duringthe experiment, and all participants were found to have madeextensive use of both. The standard noise patterns took on b valuesof 0.0, 2.0, or 3.0, while the test noise patterns took on b values of0.0, 1.0, 2.0, 3.0, or 4.0. The rationale here was to examine differ-ences in perceived contrast across a broad range of b values (i.e.,0.0–4.0 in steps of 1.0) relative to the three different standard bvalues (i.e., 0.0, 2.0, or 3.0).

During the matching experiment, each test noise pattern b waspaired with each standard noise pattern b 10 times, resulting in atotal of 150 trials per participant (with 15 different standard-to-test b pair conditions total). All participants tended to completethe matching experiment within 1 h. For each of the 10 stan-dard-to-test noise b pair repetitions, half of the standard patternswere presented on the left-hand side of the monitor and the otherhalf presented on the right-hand side. The ordering of the differentstandard-to-test noise b pairs was determined randomly, and thephase spectrum of the standard and test noise patterns on eachtrial was different, with each phase spectrum changing on a trial-by-trial basis.

4.1.3. Results and discussionPerceived matches were generated by averaging across the 10

matches made for each of the 15 different standard-to-test noiseb pairs for each participant, and then averaged across all partici-pants. Before averaging across participants, each participant’s datawere normalized to the standard noise rms contrast by subtracting0.18 from all of the data points. Thus, the data are expressed interms of rms differences from the standard noise patterns’ rmscontrast. That is, negative values indicate that participants per-ceived a given test pattern as having more perceived contrast thanthe standard (i.e., participants had to select rms values that weresmaller than the standard in order to make a perceived match),with positive values indicating the opposite. The results of thisanalysis are plotted in Fig. 4a. For clarity, the data associated witha given standard b have been further normalized to zero when thetest pattern b and standard pattern b matched (e.g., Baker & Graf,2009). Fig. 4a shows that noise patterns with b = 2.0 are alwaysperceived to have higher rms contrast when compared to the othernoise pattern bs, regardless of the standard noise pattern’s b. Forexample, when the standard noise was made to possess b = 0.0,all other noise patterns were set to a lower rms contrast (i.e., allother noise patterns were perceived to have a higher rms contrast),with b = 2.0 noise decreased the most. The differences vary some-what depending on the standard, but generally, b = 0.0 noise is per-ceived to be �0.10 to 0.15 rms units lower than b = 2.0 noise, andb = 3.0 noise is perceived to be �0.05 to 0.10 rms units lower thanb = 2.0 noise. Such a trend seems to mirror the differences in themasking effects reported in Experiment 1 – that is, b = 0.0 yieldedthe weakest relative masking effects, followed by b = 3.0 noise,with b = 2.0 noise producing the strongest masking effects. Thisbrings us to the second portion of the current experiment whichsought to equate the noise masks with the biggest differences inperceived contrast (i.e., b = 0.0 and b = 2.0 noise) and repeat thatportion of Experiment 1 with the same participants (but withoutnotch filtering for the sake of simplicity).

Fig. 4. (a) Data from Experiment 2, part I (suprathreshold noise contrast matching).On the ordinate is averaged perceived difference in rms contrast of ‘test’ noisepatterns relative to ‘standard’ noise patterns (error bars are ±1 SE, betweenparticipants). Note that each trace has been normalized to 0 when standard and testnoise patterns matched in terms of b. On the abscissa is b of the test noise patterns.(b) Data from Experiment 2, part II (equated perceived contrast masks). On theordinate is averaged proportion correct (error bars are ±1 SE, between participants),and on the abscissa is log contrast energy of the targets.

108 B.C. Hansen, R.F. Hess / Vision Research 60 (2012) 101–113

4.2. Experiment 2: Part II (equating mask perceived contrast)

4.2.1. Psychophysical procedureFour of the six observers in Experiment 1 agreed to participate

in the current experiment. The experiment task itself was identicalto Experiment 1. Except here, we only repeated the b = 0.0 condi-tion (no notch) with those masks set to 0.30 rms contrast. Thatis, we selected the largest perceived difference between standardand test (�.15 rms units) and added that to the rms of the b = 0.0masks.

4.2.2. Results and discussionThe data are plotted in the form of psychometric functions in

Fig. 4b. Data from the blank background (no-mask), b = 0.0 (0.15rms) noise mask, and b = 2.0 (0.15 rms) noise mask conditions (bothwithout a notch) from Experiment 1 are shown in Fig. 4b for refer-ence. Setting the rms contrast of the b = 0.0 noise masks to 0.30, notsurprisingly, shifted the psychometric function to the right com-pared to the 0.15 rms b = 0.0 noise masks, but did not approachthe psychometric function produced by b = 2.0 noise masks. Theaveraged contrast energy threshold for Gabor orientation identifi-cation with the 0.30 rms b = 0.0 mask condition was M = 0.084,

SE = .006 (compared to b = 0.0 (0.15 rms) mask condition:M = 0.047, SE = .007; b = 2.0 (0.15 rms) mask condition: M = 0.146,SE = 0.006). Paired t-tests show that the thresholds produced bythe 0.30 rms b = 0.0 noise masks differed significantly from thethresholds obtained with 0.15 rms b = 0.0 noise masks and 0.15rms b = 2.0 noise masks (p < .01). Critically, b = 2.0 (0.15 rms) noisemasks still produce thresholds that are 1.74 times higher than the0.30 rms b = 0.0 mask condition. Since, the 0.30 rms b = 0.0 noisemasks and 0.15 rms b = 2.0 noise masks were closely equated interms of perceived contrast, and b = 2.0 noise masks yield thresh-olds that are almost double those produced by the perceived con-trast equated b = 0.0 noise masks, it seems highly unlikely thatperceived contrast by itself can explain the differential masking re-ported in Experiment 1, which largely agrees with previous litera-ture (e.g., Georgeson & Sullivan, 1975; Hess, 1990; Pelli, 1979).Equating perceived contrast did cut the difference between b = 0.0and b = 2.0 noise masks thresholds from being 3.11 times higherthan b = 0.0 noise masks to 1.74. However, the threshold differenceaccounted for by perceived contrast explains�44% of the differencein the masking effect between b = 0.0 and b = 2.0 noise masks, leav-ing �56% of the masking difference unexplained.

5. Experiment 3

Here, we move onto stimuli that are spatially more complexthan Gabors, focusing on letter recognition. We chose letters asstimuli simply because (1) they are spatially more complex thanGabors, (2) there is a large body of literature devoted to under-standing the visual mechanisms involved in processing such stim-uli, and (3) many have argued that our ability to recognize lettersrests on a single narrowband spatial frequency channel, with thespatial frequency bandwidth of that channel ranging from 0.9 to2.3 (Alexander, Xie, & Derlacki, 1994; Gold et al., 1999; Leggeet al., 1985; Majaj et al., 2002; Parish & Sperling, 1991; Solomon& Pelli, 1994). While others have argued against the single channelhypothesis (e.g., Chung, Legge, & Tjan, 2002; Hess, Williams, &Chaudhry, 2001; Oruç & Landy, 2009), our aim here is not to pro-vide any sort of resolution regarding that debate, but simply toexamine whether the results reported in Experiments 1 and 2 gen-eralize to spatially complex stimuli. We therefore repeated Exper-iments 1 and 2 (along with the ideal observer analysis) withbandpass filtered letters. We chose to use bandpass filtered lettersbecause the broadband nature of letter stimuli makes ruling outthe confounds explored in the previous experiments quite difficult(i.e., observers would be free to choose any one (or many) channelthat proves informative).

5.1. Method

5.1.1. Psychophysical procedureThe stimuli were virtually identical to those used in Experiment

1, except with narrowband letters as targets. The psychophysicalprocedure was identical to that reported in Experiment 1 withthe following exceptions. The design consisted of a single-interval26 alternative forced-choice method of constant stimuli paradigm.The response interval consisted of a blank (mean luminance)screen at which time participants were instructed to use the letterkeys on a standard PC keyboard to make their response. The letterpresented on any given trial was determined randomly (from auniform distribution), and all participants were informed that theprobability of any given letter being displayed was equal.

5.1.2. Results and discussionAveraged thresholds are plotted in Fig. 5a. The averaged thresh-

old for the blank background condition is shown to the far left for

B.C. Hansen, R.F. Hess / Vision Research 60 (2012) 101–113 109

reference. The threshold data were analyzed with a 3 (noise maskb) � 9 (notch bandwidth) two-way repeated measures ANOVA(with Huynh–Feldt epsilon). The main effect of noise mask b wassignificant (F(2,8) = 253.39, p < .001), as was the main effect of oc-tave bandwidth (F(3,16) = 76.8, p < .001). The interaction was alsofound to be statistically significant (F(7,34) = 23.2, p < .001). Therewas also a significant linear trend for the main effect of octavebandwidth (F(1,5) = 194.4, p < .001). Post hoc two-way ANOVAsshow that each b condition threshold curve differs significantlyfrom each other curve (p < .01).

As in Experiment 1 with Gabors, Fig. 5a shows that b = 2.0 noisemasks interfered most with letter identification performance at allnotch bandwidths examined here. On average, b = 2.0 noise masksrequired 3.0 times more contrast energy at threshold than b = 0.0noise masks (ranging from 3.86 no-notch to 1.64 at 2.88 octavenotch), and 2.03 times more contrast energy than b = 3.0 noisemasks (ranging from 2.13 no-notch to 1.92 at 2.88 octave notch).Also, since the largest notch bandwidth was almost three octaves,it again seems very unlikely that the highest thresholds observedwith b = 2.0 noise masks can be explained by the difference in

Fig. 5. (a) Data from Experiment 3. On the ordinate is log averaged thresholdcontrast energy (error bars are ±1 SE, between participants), and on the abscissa isnotch filter bandwidth (full-width at half-height). The light gray empty squareshows averaged threshold contrast energy for letter identification on a blankbackground (error bars are ±1 SE, between participants). (b) Data from Experiment3 (perceptually equated noise contrast). On the ordinate is averaged proportioncorrect (error bars are ±1 SE, between participants), and on the abscissa is logcontrast energy of the targets.

rms contrast at and near the central spatial frequency of the tar-gets. Finally, it is worth noting that in a separate pilot experiment(n = 4) using the three types of noise employed here (withoutnotch filtering), we find the same threshold elevation magnitudeswith broadband letters (data not shown).

We next look at data from the repeat of Experiment 2, part II,but with bandpass letters as targets. Fig. 5b shows psychometricfunctions from b = 0.0 (0.30 rms) noise mask condition along withpsychometric functions from the blank background (no-mask) con-dition, b = 0.0 (0.15 rms) noise mask, and b = 2.0 (0.15 rms) noisemask conditions (both without a notch) for reference. The averagedcontrast energy threshold for letter identification with the 0.30 rmsb = 0.0 mask condition was M = 0.068, SE = .006 (compared tob = 0.0 (0.15 rms) mask condition: M = 0.039, SE = .006; b = 2.0(0.15 rms) mask condition: M = 0.154, SE = 0.006). Paired t-testsshow that the thresholds produced by the 0.30 rms b = 0.0 noisemasks differed significantly from the thresholds obtained with0.15 rms b = 0.0 noise masks and 0.15 rms b = 2.0 noise masks(p < .01). As in Experiment 2, it seems as though the majority ofthe masking effect observed with narrowband letter targets cannotbe explained by differences in perceived contrast of the masks.

5.1.3. Ideal observer analysisHere, we employed the same ideal observer that was described

in Experiment 1, but with narrowband letter stimuli in the notchfiltered noise. The only difference in the analysis is that the LSDwas taken between the narrowband letter stimuli alone and thedifferently bandpass filtered stimuli (i.e., the bandpass filterednotched noise masks with embedded narrowband letters). The re-sults of the ideal observer analysis are shown in Fig. 6a. The resultsproduced by the ideal observer show a similar trend compared tothe human data for the same task in that b = 2.0 noise masks pro-duce the highest thresholds for most of the notch bandwidths. So,it appears as though some of the differential masking effects in thehuman data can be explained by task constraints themselves. Wenext calculated high-noise efficiency as described in Experiment1, and show the results in Fig. 6b. As in Experiment 1, b = 2.0 noisemasks produce the lowest efficiencies, which mostly mirrors thetrends in the threshold data shown in Fig. 5. It therefore appearsthat the masking trends observed with simple Gabor stimuli canbe largely extended to include more spatially complex targetstimuli.

6. General discussion

The results of the current study all show that, regardless ofstimulus spatial complexity (e.g., narrowband Gabor and letter tar-gets), b = 2.0 noise produces the highest identification thresholdsas a function of target contrast energy. And, this finding couldnot be explained away by low-level accounts based on differencesin noise spectral density (locally or globally), limitations due toavailable stimulus information, or differences in perceived contrastof the noise masks. Such a finding is interesting because it actuallyruns counter to a number of studies that show our visual systemsare optimally suited to process stimuli to identification when pre-sented within scenes possessing typical 2nd order statistical regu-larities (e.g., Párraga, Troscianko, & Tolhurst, 2000, 2005; Tolhurst& Tadmor, 2000). However, it is important to note that the stimulito be identified in those studies were all suprathreshold, whereashere, we investigated the contrast threshold for identification. Thatis, detection was likely the limiting factor here, not identificationper se. In order to verify such a claim, we ran an additional controlexperiment where one observer engaged in a detection task withGabor targets embedded in notch filtered noise varying in b andproduced virtually identical results (data not shown) as those

Fig. 6. (a) Ideal observer analysis results reported in Experiment 3. On the ordinateis log averaged threshold contrast energy for the ideal observer, and on the abscissais notch filter bandwidth (full-width at half-height). (b) Log high noise efficiencycalculated from the data shown in Fig. 5. See text for further details.

110 B.C. Hansen, R.F. Hess / Vision Research 60 (2012) 101–113

observed in Experiment 1. Given that the detectability of varioustargets against the different noise masks used here was likely thelimiting factor in identification, it is curious that a similar trendis not clear in previous studies that measured contrast sensitivityin either broadband adaptation or simultaneous masking para-digms. Specifically, Webster and Miyahara (1997) report on anexperiment where CSFs were measured for two participants fol-lowing adaptation to broadband noise with different power spec-trum slopes. When b was set to 0.0, sensitivity to 3 cpd gratingswas marginally decreased, but was largely and approximatelyequally reduced when b was either 2.0 or 3.0. A lack of a larger de-crease in contrast sensitivity at 3 cpd for b = 2.0 noise in their study(compared to the current study) may reflect differential interfer-ence resulting from residually activated channels as opposed tosimultaneously activated channels. Conversely, Bex, Solomon,and Dakin (2009) show no difference in detection threshold for3 cpd bandpass filtered noise targets embedded in natural broad-band movie sequences where b was either left in its natural state(presumably with a frame-to-frame average near 2.0, though pos-sibly higher, e.g., Hansen & Essock, 2005) or ‘‘whitened’’ (b = 0.0).However, given that their un-altered movie sequences likely variedin b, the minimal-to-no-difference in detection threshold for 3 cpdtargets may reflect more dynamic channel interplay than what wasmeasured in the current study. Thus, within the confines of the

current experiment, limitations due to detection remain a plausi-ble account for the increased identification thresholds reportedhere and previous work showing lowest identification thresholdsfor b = 2.0 stimuli. Further, the current results are in line with pre-vious studies utilizing differently oriented contrast increments innoise or natural scene imagery where b was varied (e.g., Essocket al., 2003; Hansen & Essock, 2005), and are also apparent in thebackward masking data from studies exploring rapid scene catego-rization (Loschky et al., 2010). Thus it may be the case that in thereal-world environment, performance for identifying variable con-trast stimuli (spatially simple or complex) is at a disadvantage (butmay well be optimized for some other process).

Another interesting implication from the current study comesfrom Experiment 1. Specifically, the results of that experiment sug-gest that different noise masks result in different estimates ofchannel bandwidth, with b = 2.0 noise yielding a bandwidth>2.88 octaves. Thus, it seems that the more ‘‘naturalistic’’ the maskin terms of its 2nd order luminance statistics, the broader thebandwidth of the detecting mechanism. Previous studies usingsimilar naturalistic stimuli have also reported behavioral perfor-mance that significantly differs from studies utilizing more tradi-tional laboratory stimuli (e.g., Ellemberg, Johnson, & Hansen,submitted for publication; Essock et al., 2003; Hansen & Essock,2004; Johnson et al., 2011; McDonald & Tadmor, 2006). Thus, theresults of the current study raise a degree of caution when inter-preting estimates of unitary spatial ‘channel’ bandwidth derivedfrom white noise masking paradigms. Specifically, it has been com-mon practice in vision (e.g., Stromeyer & Julesz, 1972) as it hasbeen in audition (e.g., Glasberg & Moore, 1990; Patterson, 1976;Weber, 1977) to interpret noise masking functions in terms ofthe tuning properties of unitary neural mechanisms. The presentresults suggest that such interpretations will depend on the typeof noise used. However, given the range of notch bandwidths usedin Experiment 1, it seems almost ridiculous to suggest a detectingchannel bandwidth close to (and possibly beyond) three octaveswhen identifying the orientation of Gabor stimuli. Alternatively,tuning curves derived from simultaneous noise paradigms likelyreflect not only within-channel masking from neurons tuned tothe stimulus spatial frequency and orientation, but also cross-channel masking from neurons tuned to nearby frequencies andorientations. Indeed, overlay masking paradigms suggest broad-band inhibitory or suppressive interactions across spatial fre-quency in cat and human (e.g., Bauman & Bonds, 1991; Petrov,Carandini, & McKee, 2005). Thus, a more plausible account of thecurrent results would be one seated in the realm of contrast gaincontrol (e.g., Bex, Mareschal, & Dakin, 2007; Carandini & Heeger,1994; Heeger, 1992a, 1992b; Schwartz & Simoncelli, 2001; Watson& Solomon, 1997; Wilson & Humanski, 1993), at the level of striatecortex (e.g., Carandini & Heeger, 1994; Carandini, Heeger, & Movs-hon, 1997). In such a context, it would be the overall amount of‘activity’ within a somewhat tuned gain pool that would determinethe relative output of individual channels. Such an account hasbeen proposed to explain differences in perceived contrast forcomplex stimulus patches embedded in naturalistic surrounds(McDonald & Tadmor, 2006), as well as in a recent equivalent noisemasking study (Baker, Meese, Georgeson, & Hess, 2011). Neverthe-less, in order for such an account to be credible, b = 2.0 stimulimust load the inhibitory gain pool much more than stimuli pos-sessing other b exponents.

Field (1987), Brady and Field (1995) and Field and Brady (1997)have proposed a modified multi-channel model of visual cortexthat predicts overall larger and equivalent channel responses forb = 2.0 stimuli. In their model, the bandwidth of the filters is heldconstant in octaves on a log axis, with the peak sensitivity of eachfilter channel also held constant (Brady & Field, 1995). With such aconfiguration, any given image possessing an orientation averaged

Fig. 7. (a) One-dimensional (1D) profile of different filter channels in the Fourier domain showing constant channel bandwidth (e.g., Field, 1987). On the ordinate is filtersensitivity (i.e., filter gain), and on the abscissa is spatial frequency (note that we’re only showing one side of Fourier space). The peak-to-peak separation is approximately 0.5octaves. (b) Same as in (a), but here the channel bandwidth decreases with increasing peak spatial frequency (see text for further details). (c) Each point on the three traces is theresponse energy from each filter shown in (a) when passed noise possessing one of three different bs. The empty symbols to the far right are the averaged output of the entirebank of filters shown in (a), b = 0.0: empty gray circle; b = 2.0: empty black square; b = 3.0: empty black triangle. See text for further details. (d) Same as (c), but showing filterresponses from (b). Note the ordinate axis difference in (c) and (d).

B.C. Hansen, R.F. Hess / Vision Research 60 (2012) 101–113 111

power spectrum with b near 2.0 will produce equivalently largeenergy responses across all channels in an inhibitory gain pool,regardless of the peak spatial frequency to which each filter istuned (refer to Brady and Field (1995) for further detail). Thus, ifb = 2.0 noise largely and equally drives all spatial channels, thenone could expect that a tuned gain pool in that scenario wouldbe much more active than with other types of noise, therebylimiting most the magnitude of the output signal to decision pro-cesses. Unfortunately, the assumption of constant octave channelsrests on just one report from De Valois et al. (1982, Fig. 10) whereconstant bandwidths (viewed on log axes) are apparent for many(more than half) of the narrowly tuned neurons in their sample.This is at odds with several studies that report a decline in band-width with increasing peak spatial frequency (Blakemore & Camp-bell, 1969; Wilson, 1991; Yu et al., 2010). In fact, a multi-channelmodel designed to reflect a decreasing channel bandwidth withincreasing peak spatial frequency predicts channel outputs thatfollow the distribution of contrast across spatial frequency fornoise masks with variable b (i.e., overall lower yet equivalent chan-nel responses for b = 0.0 noise, with relatively large channel outputat lower spatial frequencies for larger b exponents). The predic-tions made by each model for how an inhibitory gain pool (centeredon a 3.05cpd target channel) would respond to different b noise areillustrated in Fig. 7. When considering the average predicted

output across all channels in the gain pools in that figure, bothmodels actually predict b = 2.0 noise masks will produce the larg-est response (leading to the highest thresholds) – i.e., note thatthe empty squares in Fig. 7c and d have the highest averaged gainpool output. However, the difference is that the variable channelbandwidth model (Fig. 7b and d) predicts the majority of the inhib-itory gain pool signal arises from channels tuned to lower spatialfrequencies, which may serve as the basis for recent evidence sug-gesting contrast gain control as primarily a low-pass process (e.g.,Cass, Stuit, et al., 2009; Meese & Holmes, 2007). It is also interest-ing to note that the constant bandwidth model (Fig. 7a and c) pre-dicts b = 3.0 noise will mask slightly more than b = 0.0 (which wasobserved in the current study), whereas the variable bandwidthmodel predicts similar masking between b = 2.0 and b = 3.0 masks(which is similar to the adaptation data reported by Webster andMiyahara (1997), but was not observed here). A more definitivevalidation for the gain control account will have to wait for furtherpsychophysical experimentation and comprehensive computa-tional modeling.

Acknowledgments

Colgate Research Council Grant to B.C.H.; NSERC Grant (41528-06) to R.F.H.

112 B.C. Hansen, R.F. Hess / Vision Research 60 (2012) 101–113

References

Alexander, K. R., Xie, W., & Derlacki, D. J. (1994). Spatial-frequency characteristics ofletter identification. Journal of the Optical Society of America A, 11(9), 2375–2382.

Baker, D. H., & Graf, E. W. (2009). Natural images dominate in binocular rivalry.Proceedings of the National Academy of Sciences of the United States of America,106(13), 5436–5441.

Baker, D. H., Meese, T. S., Georgeson, M. A., & Hess, R. F. (2011). How much of noisemasking derives from noise? Perception, 40(S), 51.

Bauman, L. A., & Bonds, A. B. (1991). Inhibitory refinement of spatial frequencyselectivity in single cells of the cat striate cortex. Vision Research, 31(6),933–944.

Bex, P. J., Mareschal, I., & Dakin, S. C. (2007). Contrast gain control in natural scenes.Journal of Vision, 7(11), 1–12.

Bex, P. J., Solomon, S. G., & Dakin, S. C. (2009). Contrast sensitivity in natural scenesdepends on edge as well as spatial frequency structure. Journal of Vision, 9(10),11–19.

Billock, V. A. (2000). Neural acclimation to 1/f spatial frequency spectra in naturalimages transduced by the human visual system. Physica D: NonlinearPhenomena, 137, 379–391.

Blakemore, C., & Campbell, F. W. (1969). On the existence of neurones in the humanvisual system selectively sensitive to the orientation and size of retinal images.Journal of Physiology, 203(1), 237–260.

Bonds, A. B. (1989). Role of inhibition in the specification of orientation selectivity ofcells in the cat striate cortex. Visual Neuroscience, 2(1), 41–55.

Bosking, W. H., Zhang, Y., Schofield, B., & Fitzpatrick, D. (1997). Orientationselectivity and the arrangement of horizontal connections in tree shrew striatecortex. Journal of Neuroscience, 17(6), 2112–2127.

Brady, N., & Field, D. J. (1995). What’s constant in contrast constancy? The effects ofscaling on the perceived contrast of bandpass patterns. Vision Research, 35(6),739–756.

Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, 10(4), 433–436.Burgess, A. E., Wagner, R. F., Jennings, R. J., & Barlow, H. B. (1981). Efficiency of

human visual signal discrimination. Science, 214(4516), 93–94.Burton, G. J., & Moorhead, I. R. (1987). Color and spatial structure in natural scenes.

Applied Optics, 26, 157–170.Campbell, F. W., & Robson, J. G. (1968). Application of Fourier analysis to the

visibility of gratings. Journal of Physiology, 197(3), 551–566.Carandini, M., Demb, J. B., Mante, V., Tolhurst, D. J., Dan, Y., Olshausen, B. A., et al.

(2005). Do we know what the early visual system does? Journal of Neuroscience,25(46), 10577–10597.

Carandini, M., & Heeger, D. J. (1994). Summation and division by neurons in primatevisual cortex. Science, 264(5163), 1333–1336.

Carandini, M., Heeger, D. J., & Movshon, J. A. (1997). Linearity and normalization insimple cells of the macaque primary visual cortex. Journal of Neuroscience,17(21), 8621–8644.

Cass, J., Alais, D., Spehar, B., & Bex, P. J. (2009). Temporal whitening: Transient noiseperceptually equalizes the 1/f temporal amplitude spectrum. Journal of Vision,9(10).

Cass, J., Stuit, S., Bex, P., & Alais, D. (2009). Orientation bandwidths are invariantacross spatiotemporal frequency after isotropic components are removed.Journal of Vision, 9(12).

Chung, S. T., Legge, G. E., & Tjan, B. S. (2002). Spatial-frequency characteristics ofletter identification in central and peripheral vision. Vision Research, 42(18),2137–2152.

DeAngelis, G. C., Robson, J. G., Ohzawa, I., & Freeman, R. D. (1992). Organization ofsuppression in receptive fields of neurons in cat visual cortex. Journal ofNeurophysiology, 68(1), 144–163.

De Valois, R. L., Albrecht, D. G., & Thorell, L. G. (1982). Spatial frequency selectivity ofcells in macaque visual cortex. Vision Research, 22(5), 545–559.

De Valois, R. L., Yund, E. W., & Hepler, N. (1982). The orientation and directionselectivity of cells in macaque visual cortex. Vision Research, 22(5), 531–544.

Ellemberg, D., Johnson, A. P., & Hansen, B. C. (submitted for publication). Is thedeveloping visual system tuned to the spatial characteristics of its naturalenvironment?.

Essock, E. A., DeFord, J. K., Hansen, B. C., & Sinai, M. J. (2003). Oblique stimuli areseen best (not worst!) in naturalistic broad-band stimuli: A horizontal effect.Vision Research, 43(12), 1329–1335.

Field, D. J. (1987). Relations between the statistics of natural images and theresponse properties of cortical cells. Journal of the Optical Society of America A, 4,2379–2394.

Field, D. J. (1993). Scale-invariance and self-similar ’wavelet’ transforms: Ananalysis of natural scenes and mammalian visual systems. In M. Farge, J. C. R.Hunt & J. C. Vassilicos (Eds.), Wavelets, Fractals and Fourier Transforms: NewDevelopments and New Applications: Oxford University Press.

Field, D. J., & Brady, N. (1997). Visual sensitivity, blur and the sources of variabilityin the amplitude spectra of natural scenes. Vision Research, 37, 3367–3383.

Field, D. J., & Tolhurst, D. J. (1986). The structure and symmetry of simple-cellreceptive-field profiles in the cat’s visual cortex. Proceedings of the Royal Societyof London B, 228(1253), 379–400.

Fitzpatrick, D. (2000). Seeing beyond the receptive field in primary visual cortex.Current Opinion in Neurobiology, 10(4), 438–443.

Georgeson, M. A., & Sullivan, G. D. (1975). Contrast constancy: Deblurring in humanvision by spatial frequency channels. Journal of Physiology, 252(3), 627–656.

Glasberg, B. R., & Moore, B. C. (1990). Derivation of auditory filter shapes fromnotched-noise data. Hearing Research, 47(1–2), 103–138.

Gold, J., Bennett, P. J., & Sekuler, A. B. (1999). Identification of band-pass filteredletters and faces by human and ideal observers. Vision Research, 39(21),3537–3560.

Graham, N., & Nachmias, J. (1971). Detection of grating patterns containing twospatial frequencies: A comparison of single-channel and multiple-channelsmodels. Vision Research, 11(3), 251–259.

Hansen, B. C., & Essock, E. A. (2004). A horizontal bias in human visual processing oforientation and its correspondence to the structural components of naturalscenes. Journal of Vision, 4(12), 1044–1060.

Hansen, B. C., & Essock, E. A. (2005). Influence of scale and orientation on the visualperception of natural scenes. Visual Cognition, 12, 1199–1234.

Hansen, B. C., Haun, A. M., & Essock, E. A. (2008). The ‘‘Horizontal Effect’’: Aperceptual anisotropy in visual processing of naturalistic broadband stimuli. InT. A. Portocello & R. B. Velloti (Eds.), Visual cortex: New research. New York: NovaScience Publishers.

Hansen, B. C., & Hess, R. F. (2006). Discrimination of amplitude spectrum slope inthe fovea and parafovea and the local amplitude distributions of natural sceneimagery. Journal of Vision, 6(7), 696–711.

Heeger, D. J. (1992a). Half-squaring in responses of cat striate cells. VisualNeuroscience, 9(5), 427–443.

Heeger, D. J. (1992b). Normalization of cell responses in cat striate cortex. VisualNeuroscience, 9(2), 181–197.

Henning, G. B., Hertz, B. G., & Hinton, J. L. (1981). Effects of different hypotheticaldetection mechanisms on the shape of spatial-frequency filters inferred frommasking experiments: I. Noise masks. Journal of the Optical Society of America,71, 574–581.

Hess, R. F. (1990). The Edridge-Green lecture: Vision at low light levels: Role ofspatial, temporal and contrast filters. Ophthalmic and Physiological Optics, 10(4),351–359.

Hess, R. F., Williams, C. B., & Chaudhry, A. (2001). Contour interaction for an easilyresolvable stimulus. Journal of the Optical Society of America A, 18, 2414–2418.

Johnson, A. P., Richard, B., Hansen, B. C., & Ellemberg, D. (2011). The magnitude ofcenter–surround facilitation in the discrimination of amplitude spectrum isdependent on the amplitude of the surround. Journal of Vision, 11(7), 1–10.

Kersten, D. (1984). Spatial summation in visual noise. Vision Research, 24(12),1977–1990.

Kretzmer, E. R. (1952). The statistics of television signals. Bell System TechnicalJournal, 31, 751–763.

Legge, G. E. (1978). Space domain properties of a spatial frequency channel inhuman vision. Vision Research, 18(8), 959–969.

Legge, G. E., & Foley, J. M. (1980). Contrast masking in human vision. Journal of theOptical Society of America, 70(12), 1458–1471.

Legge, G. E., Pelli, D. G., Rubin, G. S., & Schleske, M. M. (1985). Psychophysics ofreading—I. Normal vision. Vision Research, 25(2), 239–252.

Loschky, L. C., Hansen, B. C., Sethi, A., & Tejaswi, P. N. (2010). The role of higher orderimage statistics in masking scene gist recognition. Attention, Perception, &Psychophysics, 72(2).

Maffei, L., & Fiorentini, A. (1973). The visual cortex as a spatial frequency analyser.Vision Research, 13(7), 1255–1267.

Majaj, N. J., Pelli, D. G., Kurshan, P., & Palomares, M. (2002). The role of spatialfrequency channels in letter identification. Vision Research, 42(9), 1165–1184.

McDonald, J. S., & Tadmor, Y. (2006). The perceived contrast of texture patchesembedded in natural images. Vision Research, 46(19), 3098–3104.

Meese, T. S., & Holmes, D. J. (2007). Spatial and temporal dependencies of cross-orientation suppression in human vision. Proceedings of the Royal Society ofLondon B, 274(1606), 127–136.

Meese, T. S., & Holmes, D. J. (2010). Orientation masking and cross-orientationsuppression (XOS): Implications for estimates of filter bandwidth. Journal ofVision, 10(12), 9.

Meier, L., & Carandini, M. (2002). Masking by fast gratings. Journal of Vision, 2(4),293–301.

Merigan, W. H., & Maunsell, J. H. (1993). How parallel are the primate visualpathways? Annual Review of Neuroscience, 16, 369–402.

Morrone, M. C., Burr, D. C., & Maffei, L. (1982). Functional implications of cross-orientation inhibition of cortical visual cells. I. Neurophysiological evidence.Proceedings of the Royal Society London, B, 216(1204), 335–354.

Mostafavi, H., & Sakrison, D. J. (1976). Structure and properties of a single channel inthe human visual system. Vision Research, 16(9), 957–968.

Nelson, S., Toth, L., Sheth, B., & Sur, M. (1994). Orientation selectivity of corticalneurons during intracellular blockade of inhibition. Science, 265(5173),774–777.

Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holisticrepresentation of the spatial envelope. International Journal of Computer Vision,42, 145–175.

Olshausen, B. A., & Field, D. J. (2005). How close are we to understanding v1? NeuralComputation, 17(8), 1665–1699.

Olzak, L. A. (1985). Interactions between spatially tuned mechanisms: Convergingevidence. Journal of the Optical Society of America A, 2(9), 1551–1559.

Olzak, L. A., & Thomas, J. P. (1991). When orthogonal orientations are not processedindependently. Vision Research, 31(1), 51–57.

Oruç, I., & Barton, J. J. (2010). Critical frequencies in the perception of letters, faces,and novel shapes: Evidence for limited scale invariance for faces. Journal ofVision, 10(12).

B.C. Hansen, R.F. Hess / Vision Research 60 (2012) 101–113 113

Oruç, I., & Landy, M. S. (2009). Scale dependence and channel switching in letteridentification. Journal of Vision, 9(9).

Pantle, A., & Sekuler, R. (1968). Size-detecting mechanisms in human vision. Science,162(858), 1146–1148.

Parish, D. H., & Sperling, G. (1991). Object spatial frequencies, retinal spatialfrequencies, noise, and the efficiency of letter discrimination. Vision Research,31(7–8), 1399–1415.

Párraga, C. A., Troscianko, T., & Tolhurst, D. J. (2000). The human visual system isoptimised for processing the spatial information in natural visual images.Current Biology, 10(1), 35–38.

Párraga, C. A., Troscianko, T., & Tolhurst, D. J. (2005). The effects of amplitude-spectrum statistics on foveal and peripheral discrimination of changes innatural images, and a multi-resolution model. Vision Research, 45(25–26),3145–3168.

Patterson, R. D. (1976). Auditory filter shapes derived with noise stimuli. Journal ofthe Acoustical Society of America, 59(3), 640–654.

Pelli, D. G. (1979). The effects of noise masking and contrast adaptation on contrastdetection, contrast discrimination and apparent contrast. InvestigativeOphthalmology & Visual Science, (Suppl.), 59 (April).

Pelli, D. G., Burns, C. W., Farell, B., & Moore-Page, D. C. (2006). Feature detection andletter identification. Vision Research, 46(28), 4646–4674.

Pelli, D. G., & Farell, B. (1999). Why use noise? Journal of the Optical Society ofAmerica A, 16(3), 647–653.

Petrov, Y., Carandini, M., & McKee, S. (2005). Two distinct mechanisms ofsuppression in human vision. Journal of Neuroscience, 25(38), 8704–8707.

Phillips, G. C., & Wilson, H. R. (1984). Orientation bandwidths of spatial mechanismsmeasured by masking. Journal of the Optical Society of America A, 1(2), 226–232.

Ringach, D. L. (2002). Spatial structure and symmetry of simple-cell receptive fieldsin macaque primary visual cortex. Journal of Neurophysiology, 88(1), 455–463.

Ross, J., & Speed, H. D. (1991). Contrast adaptation and contrast masking in humanvision. Proceedings of Biological Sciences, 246(1315), 61–69.

Ruderman, D. L., & Bialek, W. (1994). Statistics of natural images: Scaling in thewoods. Physical Review Letters, 73, 814–817.

Sachs, M. B., Nachmias, J., & Robson, J. G. (1971). Spatial-frequency channels inhuman vision. Journal of the Optical Society of America, 61(9), 1176–1186.

Schwartz, O., & Simoncelli, E. P. (2001). Natural signal statistics and sensory gaincontrol. Nature Neuroscience, 4(8), 819–825.

Shapley, R., & Lennie, P. (1985). Spatial frequency analysis in the visual system.Annual Review of Neuroscience, 8, 547–583.

Simoncelli, E. P., & Olshausen, B. A. (2001). Natural image statistics and neuralrepresentation. Annual Review of Neuroscience, 24, 1193–1216.

Solomon, J. A. (2000). Channel selection with non-white-noise masks. Journal of theOptical Society of America A, 17(6), 986–993.

Solomon, J. A., & Pelli, D. G. (1994). The visual filter mediating letter identification.Nature, 369(6479), 395–397.

Stromeyer, C. F., III, & Julesz, B. (1972). Spatial-frequency masking in vision: Criticalbands and spread of masking. Journal of the Optical Society of America, 62(10),1221–1232.

Talgar, C. P., Pelli, D. G., & Carrasco, M. (2004). Covert attention enhances letteridentification without affecting channel tuning. Journal of Vision, 4(1), 22–31.

Thomson, M. G. A., & Foster, D. H. (1997). Role of second- and third-order statisticsin the discriminability of natural images. Journal of the Optical Society of AmericaA, 14(9), 2081–2090.

Tjan, B. S., Braje, W. L., Legge, G. E., & Kersten, D. (1995). Human efficiency forrecognizing 3-D objects in luminance noise. Vision Research, 35(21), 3053–3069.

Tolhurst, D. J., & Tadmor, Y. (2000). Discrimination of spectrally blended naturalimages: Optimisation of the human visual system for encoding natural images.Perception, 29(9), 1087–1100.

Tolhurst, D. J., Tadmor, Y., & Tang, C. (1992). Amplitude spectra of natural images.Ophthalmic and Physiological Optics, 12, 229–232.

Torralba, A., & Oliva, A. (2003). Statistics of natural image categories. Network:Computation in Neural Systems, 14, 391–412.

Tyler, C. W. (1997). Colour bit-stealing to enhance the luminance resolution ofdigital displays on a single pixel basis. Spatial Vision, 10(4), 369–377.

van der Schaaf, A., & van Hateren, J. H. (1996). Modeling the power spectra ofnatural images: Statistics and information. Vision Research, 36, 2759–2770.

Watson, A. B., & Solomon, J. A. (1997). Model of visual contrast gain control andpattern masking. Journal of the Optical Society of America A, 14(9), 2379–2391.

Weber, D. L. (1977). Growth of masking and the auditory filter. Journal of theAcoustical Society of America, 62(2), 424–429.

Webster, M. A., & Miyahara, E. (1997). Contrast adaptation and the spatial structureof natural images. Journal of the Optical Society of America A, 14(9), 2355–2366.

Wichmann, F. A., & Hill, N. J. (2001a). The psychometric function: I. Fitting,sampling, and goodness of fit. Perception & Psychophysics, 63(8), 1293–1313.

Wichmann, F. A., & Hill, N. J. (2001b). The psychometric function: II. Bootstrap-based confidence intervals and sampling. Perception & Psychophysics, 63(8),1314–1329.

Wilson, H. R. (1991). Psychophysical models of spatial vision and hyperacuity.Spatial Vision, 10, 64–86.

Wilson, H. R., & Bergen, J. R. (1979). A four mechanism model for threshold spatialvision. Vision Research, 19(1), 19–32.

Wilson, H. R., & Humanski, R. (1993). Spatial frequency adaptation and contrast gaincontrol. Vision Research, 33(8), 1133–1149.

Wilson, H. R., McFarlane, D. K., & Phillips, G. C. (1983). Spatial frequency tuning oforientation selective units estimated by oblique masking. Vision Research, 23(9),873–882.

Yu, H. H., Verma, R., Yang, Y., Tibballs, H. A., Lui, L. L., Reser, D. H., et al. (2010).Spatial and temporal frequency tuning in striate cortex: Functional uniformityand specializations related to receptive field eccentricity. European Journal ofNeuroscience, 31(6), 1043–1062.