fast and accurate measurement of taste and smell ...psych-lharvey/publications/(2001)...

18
Copyright 2001 Psychonomic Society, Inc. 1330 Perception & Psychophysics 2001, 63 (8), 1330-1347 Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable effort has been made to improve the psychophysical methods used to measure sensory sensitivity (Guilford, 1936, 1954). We know that two psychological processes influence detection and dis- crimination performance: a sensory process and a deci- sion process (Krantz, 1969; Swets, 1961; Swets, Tanner, & Birdsall, 1961). It is now possible to measure the sen- sitivity of the sensory process in a manner that is not in- fluenced by the properties of the decision process. Such measures of sensitivity are said to be bias-free. How one measures sensitivity depends on the model of the sensory process one adopts. Classical psychophysical methods assumed that the sensory process had a thresh- old, a stimulus value that must be exceeded in order for the stimulus to have any effect on the observer. These methods were therefore designed to measure the average value of this threshold. Sensitivity to a particular stimu- lus was represented by the probability, p, that the stimu- lus would exceed the sensory threshold. A stimulus that is equal to the average threshold value would have a p of .5. Other threshold models assumed that the sensory pro- cess had two or more sensory thresholds. By the 1960s, however, it became clear that all models of the sensory process that assume one or more thresh- olds are invalid because they make predictions not sup- ported by experimental data (Krantz, 1969; Swets, 1961, 1986a, 1986b, 1996). Sensitivity measures derived from these threshold models are contaminated by properties of the decision process. Measures of sensitivity that are uncontaminated by decision-process properties (i.e., response bias) have been developed within the framework of signal detection theory (SDT). One almost bias-free measure of sensitivity is per- cent correct in the two-alternative forced-choice (2AFC) experimental paradigm (Green & Swets, 1966/1974; Macmillan & Creelman, 1991; Swets, 1986b). When per- cent correct (or probability of being correct) is plotted as a function of stimulus intensity, the resulting S-shaped curve forms what Urban (1908) first called a psychometric func- tion. A typical 2AFC psychometric function formed by 20 stimulus intensities spanning a 3-log-unit range in 0.16-log-unit steps is illustrated in Figure 1. Each prob- ability was computed from 1,000 presentations of the corresponding stimulus intensity. Sensitivity is specified by the stimulus intensity re- quired for a subject to reach a certain percent correct on Parts of these data have been presented at the Eighteenth and Nine- teenth Annual Meetings of the Association for Chemoreception Sci- ences, 1996 and 1997. The authors thank Stan Klein for his clarifying questions and helpful comments. The computer routines to carry out the computations are available from author L.O.H. at his Web site: http://psych.colorado.ed u/~lharvey/. Correspondence should be addressed to M. R. Linschoten, Rocky Mountain Taste and Smell Center, UCHSC Box B-205, Denver, CO 80262 (e-mail: miriam.linschoten@uchs c.edu). Fast and accurate measurement of taste and smell thresholds using a maximum-likelihood adaptive staircase procedure MIRIAM R. LINSCHOTEN University of Colorado Health Sciences Center, Denver, Colorado LEWIS O. HARVEY, JR. University of Colorado, Boulder, Colorado and PAMELA M. ELLER and BRUCE W. JAFEK University of Colorado Health Sciences Center, Denver, Colorado This paper evaluates the use of a maximum-likelihood adaptive staircase psychophysical procedure (ML-PEST), originally developed in vision and audition, for measuring detection thresholds in gusta- tion and olfaction. The basis for the psychophysical measurement of thresholds with the ML-PEST pro- cedure is developed. Then, two experiments and four simulations are reported. In the first experiment, ML-PEST was compared with the Wetherill and Levitt up–down staircase method and with the Cain as- cending method of limits in the measurement of butyl alcohol thresholds. The four Monte Carlo simu- lations compared the three psychophysical procedures. In the second experiment, the test–retest reli- ability of ML-PEST for measuring NaCl and butyl alcohol thresholds was assessed. The results indicate that the ML-PEST method gives reliable and precise threshold measurements. Its ability to detect ma- lingerers shows considerable promise. It is recommended for use in clinical testing.

Upload: others

Post on 22-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

Copyright 2001 Psychonomic Society Inc 1330

Perception amp Psychophysics2001 63 (8) 1330-1347

Since the publication of Elemente der Psychophysik(Fechner 1860) considerable effort has been made toimprove the psychophysical methods used to measuresensory sensitivity (Guilford 1936 1954) We know thattwo psychological processes influence detection and dis-crimination performance a sensory process and a deci-sion process (Krantz 1969 Swets 1961 Swets Tanneramp Birdsall 1961) It is now possible to measure the sen-sitivity of the sensory process in a manner that is not in-fluenced by the properties of the decision process Suchmeasures of sensitivity are said to be bias-free

How one measures sensitivity depends on the model ofthe sensory process one adopts Classical psychophysicalmethods assumed that the sensory process had a thresh-old a stimulus value that must be exceeded in order forthe stimulus to have any effect on the observer Thesemethods were therefore designed to measure the averagevalue of this threshold Sensitivity to a particular stimu-lus was represented by the probability p that the stimu-

lus would exceed the sensory threshold A stimulus thatis equal to the average threshold value would have a p of5 Other threshold models assumed that the sensory pro-cess had two or more sensory thresholds

By the 1960s however it became clear that all modelsof the sensory process that assume one or more thresh-olds are invalid because they make predictions not sup-ported by experimental data (Krantz 1969 Swets 19611986a 1986b 1996) Sensitivity measures derived fromthese threshold models are contaminated by propertiesof the decision process

Measures of sensitivity that are uncontaminated bydecision-process properties (ie response bias) have beendeveloped within the framework of signal detection theory(SDT) One almost bias-free measure of sensitivity is per-cent correct in the two-alternative forced-choice (2AFC)experimental paradigm (Green amp Swets 19661974Macmillan amp Creelman 1991 Swets 1986b) When per-cent correct (or probability of being correct) is plotted as afunction of stimulus intensity the resulting S-shaped curveforms what Urban (1908) first called a psychometric func-tion A typical 2AFC psychometric function formed by20 stimulus intensities spanning a 3-log-unit range in016-log-unit steps is illustrated in Figure 1 Each prob-ability was computed from 1000 presentations of thecorresponding stimulus intensity

Sensitivity is specif ied by the stimulus intensity re-quired for a subject to reach a certain percent correct on

Parts of these data have been presented at the Eighteenth and Nine-teenth Annual Meetings of the Association for Chemoreception Sci-ences 1996 and 1997 The authors thank Stan Klein for his clarifyingquestions and helpful comments The computer routines to carry outthe computations are available from author LOH at his Web sitehttppsychcoloradoed u~lharvey Correspondence should be addressedto M R Linschoten Rocky Mountain Taste and Smell Center UCHSCBox B-205 Denver CO 80262 (e-mail miriamlinschotenuchs cedu)

Fast and accurate measurement of taste and smellthresholds using a maximum-likelihood

adaptive staircase procedure

MIRIAM R LINSCHOTENUniversity of Colorado Health Sciences Center Denver Colorado

LEWIS O HARVEY JRUniversity of Colorado Boulder Colorado

and

PAMELA M ELLER and BRUCE W JAFEKUniversity of Colorado Health Sciences Center Denver Colorado

This paper evaluates the use of a maximum-likelihood adaptive staircase psychophysical procedure(ML-PEST) originally developed in vision and audition for measuring detection thresholds in gusta-tion and olfaction The basis for the psychophysical measurement of thresholds with the ML-PEST pro-cedure is developed Then two experiments and four simulations are reported In the first experimentML-PEST was compared with the Wetherill and Levitt upndashdown staircase method and with the Cain as-cending method of limits in the measurement of butyl alcohol thresholds The four Monte Carlo simu-lations compared the three psychophysical procedures In the second experiment the testndashretest reli-ability of ML-PEST for measuring NaCl and butyl alcohol thresholds was assessed The results indicatethat the ML-PEST method gives reliable and precise threshold measurements Its ability to detect ma-lingerers shows considerable promise It is recommended for use in clinical testing

FAST AND ACCURATE THRESHOLDS 1331

the psychometric function Typically that performancepoint is halfway between chance and 100 correct per-formance The 2AFC function has a halfway point of75 In the data shown in Figure 1 no stimulus gave aperformance of exactly 75 correct and this value musttherefore be estimated by some suitable means A logis-tic psychometric function (Berkson 1951 1953 1955)shown as the smooth curve was fit to the data using amaximum-likelihood technique (Harvey 1986 1997Treutwein amp Strasburger 1999) The logistic function isgiven by

(1)

The function is specified by three parameters a (thestimulus at the halfway point) b (the steepness of thefunction) and g (the probability of being correct bychance) The stimulus intensity corresponding to a ismarked by the vertical line in Figure 1 and represents thestimulus that would achieve 75 correct in the 2AFCtask (marked by the horizontal line) Even though thesensory process as formulated in SDT has no sensorythreshold the value of a is still widely referred to as thethreshold In this paper the term threshold will be usedto mean the stimulus value a that gives performancehalfway from chance to 100

With the above framework in mind it can be seen thatthe goal of all psychophysical methods for measuring

thresholds and sensitivity from the classical methods ofFechner (Guilford 1936 1954) to modern adaptive stair-case methods (Cornsweet 1962 Harvey 1986 1997Treutwein amp Strasburger 1999 Watson amp Pelli 1983Wetherill amp Levitt 1965) is to estimate the value of aIn vision and audition it is practical to measure sensitivitywith large numbers of trials using computer-generatedstimuli In the chemical senses however the physicalpresentation of the stimulus is not easily accomplishedwithout human intervention Furthermore the longer re-covery time in the chemical senses prevents rapid suc-cessive presentation of stimuli These facts limit the totalnumber of psychophysical trials that can be presented ina testing session before fatigue andor boredom set inWe are therefore faced with the question ldquoHow can webest estimate the threshold with a limited number of ex-perimental trialsrdquo A reasonable solution to this problemwould be especially valuable for clinical testing of tasteand smell function and for experiments in which sensitiv-ity is repeatedly measured over a period of days or weeks

A review of the taste and smell literature reveals thatthe two most widely used methods to measure detectionsensitivity are the ascending method of limits for smell(Apter Gent amp Frank 1999 Cain Gent Catalanotto ampGoodspeed 1983 Cometto-Muntildeiz amp Cain 1995 Deemset al 1991 Duffy Cain amp Ferris 1999 Gagnon Mer-gler amp Lapare 1994 Lehrner Bruumlcke Dal-Bianco Gat-terer amp Kryspin-Exner 1997 Lehrner Kryspin-Exneramp Vetter 1995 Moll Klimek Eggers amp Mann 1998Murphy Nordin de Wijk Cain amp Polich 1994 Nordinet al 1996 Pierce Doty amp Amoore 1996 Rosenblatt

P xx

( ) = + ( ) times

+ aeligegrave

oumloslash

aelig

egrave

ccedilccedilccedilccedil

ouml

oslash

dividedividedividedivide

g g

ab

1 1

1

Figure 1 Typical S-shaped psychometric function for stimulus detection in a two-alternativeforced-choice paradigm The data were generated by a Monte Carlo logistic observer havingan a of 10 stimulus intensity units a b of 35 and a g of 05 Each of the 20 stimulus intensi-ties was presented 1000 times The solid line is the best-fitting logistic function fit to the datausing a maximum-likelihood curve-fitting technique (Harvey 1986 1997)

1332 LINSCHOTEN HARVEY ELLER AND JAFEK

Olmstead Iwamoto-Schaapp amp Jarvik 1998) and theone-upndashtwo-down variant of the Wetherill and Levittstaircase method (Wetherill amp Levitt 1965) for taste(Anliker Bartoshuk Ferris amp Hooks 1991 Cowart1989 Cowart Yokomukai amp Beauchamp 1994 Drew-nowski Henderson amp Shore 1997 Grushka Sessle ampHowley 1986 Murphy amp Cain 1986 Murphy et al1994 Stevens 1995 Weiffenbach Baum amp Burghauser1982 Weiffenbach Schwartz Atkinson amp Fox 1995)The method of constant stimuli could be used to measurea complete psychometric function like the one shown inFigure 1 using a fixed number of stimuli presented nu-merous times in random order Even though the shape ofthe function can be highly informative this method isonly rarely used in taste and smell research most likelybecause of the large number of experimental trials re-quired for stable estimates (Linschoten amp Kroeze 19911992 1994) A few experimenters have used an SDT par-adigm (yesndashno or rating scale judgments) to measure bothsensory sensitivity and response bias (Doty Snyder Hug-gins amp Lowry 1981 OrsquoMahony et al 1979)

In audition and vision adaptive psychophysical meth-ods are widely employed and the theoretical bases forthem are well understood To minimize the number of

trials the most efficient testing strategy is to compute anestimate of the threshold after each psychophysical trialand use a stimulus close to the threshold on the next trial(Taylor amp Creelman 1967) The estimation of the thresh-old is done by fitting a psychometric function to the datacollected up to that point in the experiment and estimat-ing the value of the threshold from the best-fitting func-tion This method is formally called a maximum-likelihoodadaptive staircase method and is known by several dif-ferent names Best PEST (Pentland 1980) QUEST (Wat-son amp Pelli 1983) and ML-PEST (Harvey 1986 1997)We will refer to it as the ML-PEST method (maximum-likelihood parameter estimation by sequential testing)The term PEST was originally coined by Taylor andCreelman (1967)

That good estimates of the threshold can be obtainedfrom sparse data is shown in Figure 2 where psychome-tric functions for the same 20 stimuli as in Figure 1 butwith 1000 100 10 and 1 presentations per stimulus areshown As the number of trials per stimulus intensity de-creases the empirical psychometric function becomesmore and more irregular The solid curve in each figureis the best-fitting logistic function (best value of a wasfound holding b and g constant) obtained with a maximum-

Figure 2 Typical S-shaped psychometric function for stimulus detection in a two-alternative forced-choice paradigm The datawere generated by a Monte Carlo logistic observer having an a of 10 stimulus intensity units a b of 35 and a g of 05 Each of the 20stimulus intensities was presented 1000 100 10 and 1 times The solid line is the best-fitting logistic function fit to the data using amaximum-likelihood curve-fitting technique (Harvey 1986 1997)

FAST AND ACCURATE THRESHOLDS 1333

likelihood fitting technique (Harvey 1986 1997) Evenin the extreme case of 1 presentation per stimulus thebest-fitting logistic function gives a good estimate of thethreshold

Our goal in the present paper is to introduce the use ofthe maximum-likelihood adaptive staircase procedurefor measuring sensitivity in the chemical senses We willcompare the efficiency and the accuracy of this methodwith those of other methods and present testndashretest dataas an assessment of reliability

MAXIMUM-LIKELIHOOD ADAPTIVESTAIRCASE PSYCHOPHYSICAL METHOD

The principles of this method are described in a vari-ety of sources (Hall 1968 1981 Harvey 1986 1997Hays 1963 Watson amp Pelli 1983) which provide moredetail than is presented here For this discussion it is as-sumed that the psychometric function is logistic and thatdata are collected with a 2AFC experimental paradigmalthough other functions and paradigms could certainlybe used During psychophysical testing there are threemajor questions that are answered by this method aftereach trial (1) What is the current estimate of the thresh-old (2) What stimulus should be presented to the sub-ject on the next trial (3) Should the psychophysical tri-als be stopped Although the answers are somewhatinterrelated it is useful to consider them separately

Estimation of the ThresholdDuring psychophysical testing the raw data collected

on each trial are the stimulus concentration presented tothe subject and whether or not the subject made a correctresponse In order to estimate the threshold a series ofpossible psychometric functions is considered A likeli-hood function is constructed by computing the log like-lihood for each of these possible psychometric functionseach one having the same value of b (steepness) and g(the probability of being correct by chance) and eachhaving a different value of a The computational proce-dure is described by Harvey (1986 1997) The smallestcandidate a is chosen to lie below any possible realthreshold or at least lower than the lowest physical stim-ulus Successive values increase in 001-log-unit stepsup to an appropriate maximum In testing the detectionof NaCl for example we consider 451 as ranging from

450 log molar concentration (log M) to 000 log molarconcentration in 001-log steps Each candidate logisticfunction has a fixed b (35) and a fixed g (05) for the2AFC psychophysical paradigm The b value is not crit-ical for converging on the correct value of b The valueof 35 is close to actual psychometric functions for NaClmeasured with the method of constant stimuli The like-lihood functions after having run 0 5 10 15 20 and 25trials on a Monte Carlo observer with a true value of aof 2125 log M (00075 M) are plotted in the upperpanel of Figure 3 The best estimate of the threshold is

that value of a having the maximum likelihood (ie themean of the likelihood distribution) In this simulationthe best estimate of a was 2110 log M after 25 trialsan error of 0015 log M

Choosing the Next StimulusAt the beginning of an experiment before any data have

been collected all candidate as have equal likelihood ofbeing the threshold But often the experimenter has someprior idea of what the threshold value should be espe-cially if the subject has normal sensory function We in-corporate these prior expectations into the method bykeeping track of a second likelihood function that startswith the prior likelihoods and adds the results of each trialto it We use a Gaussian distribution for the prior likeli-hood function with a standard deviation of 040 which isthe value we found in our testndashretest experiment to be re-ported below (Experiment 2) In the example the initialestimate of the threshold was 1875 log M These pos-terior likelihood functions are plotted in the lower panel ofFigure 3 after having run 0 5 10 and 15 trials on the sameobserver The mode of this posterior likelihood functionafter each trial is used to choose the next stimulus

In taste and smell it is usually not feasible to maintainmore than 20 or so different stimulus concentrations forpresentation to an observer The stimulus recommendedby the maximum of the posterior likelihood functiontherefore may not actually be available to the experi-menter to use on the next trial There will be an actualstimulus that is lower than the recommendation and an-other stimulus that is higher than the recommendationTo choose between these two we compute relative dis-tance of the recommended stimulus between the two ac-tual stimuli and we compute the relative frequency ofeach of the two actual stimuli We choose one randomlywith a probability that favors the stimulus closer to therecommendation and the stimulus with the fewer numberof trials This computation is done in CStim-gtGetBest-StimulusIndex() in the ML-PEST software packageKeeping the list of candidate as separate from the list ofavailable stimuli means that the threshold a can be com-puted to a precision that is not limited by the small num-ber of actual stimuli used in the experiment

For NaCl detection we use 18 different concentrationsranging from 400 log M to 025 log M in 025-log-unitsteps The actual stimulus concentrations that were pre-sented to the Monte Carlo observer are plotted in Fig-ure 4 for each of the 25 trials The solid circles representtrials on which the observer was correct the open circlesare incorrect trials Because the observer was correct onthe first three trials the stimuli that were presented weresuccessively weaker concentrations On Trial 4 an errorwas made and the stimulus on the following trial wasthe same concentration Generally but not always (be-cause of the random process of choosing the next stimu-lus) the next stimulus will be weaker after a correct re-sponse and stronger after an incorrect response The

1334 LINSCHOTEN HARVEY ELLER AND JAFEK

maximum-likelihood estimate of the threshold is plottedby the solid line in Figure 4 Until both correct and in-correct trials are collected the estimate of the thresholdwill be at either the lower or the upper end of the rangeof candidate as In this example the observer was cor-rect on the first three trials and then made a mistake onTrial 4 The estimate of the threshold was extremely low(not visible in the figure) until Trial 4 when it jumped upto the region near the true value As the trials continuethe estimated threshold fluctuates around the true valueand eventually converges on it After 25 trials the esti-mated threshold was 211 log M an error of 0015

log units This low error is achieved even though no ac-tual stimulus corresponding to the threshold was avail-able for testing

When to StopTwo stopping criteria can be used to terminate psy-

chophysical testing stop after a fixed number of trials orstop after the estimation of the threshold reaches a re-quired level of precision Precision is measured by theconfidence interval which is related to the width of thelikelihood function Notice in Figure 3 that the likeli-hood function becomes narrower as the number of trials

Figure 3 Log likelihood (upper panel) and log posterior likelihood (lower panel) as a function ofcandidate a following 0 5 10 15 20 and 25 Monte Carlo trials with a logistic observer having atrue a of 00075 M ( 2125 log M) detecting NaCl in a two-alternative forced-choice detectionparadigm

FAST AND ACCURATE THRESHOLDS 1335

increases The confidence interval of the estimatedthreshold is directly related to the width of the likelihooddistribution and is calculated using the likelihood ratiotest and the c2 statistical distribution with 1 degree offreedom (Hays 1963)

(2)

where Lx is the likelihood of a model whose a parameteris x and Lmax is the likelihood of the best-fitting modelWe search for the stimulus concentrations (x) above andbelow the threshold stimulus (the maximum-likelihooda) at which the log likelihood drops to c22 below themaximum log likelihood value To find the 95 confi-dence interval the criterion value of c2 is 50239 (thisvalue is obtained from standard tables of the c2 distribu-tion found in all statistic text books The one-tailed valueof 50239 corresponds to a probability of 025 since theconfidence interval is two-tailed 50239 corresponds toa probability of 05)

In Figure 5 the same data shown in Figure 4 are plot-ted with an expanded vertical scale to show the estimateof the threshold along with the estimated 95 confi-dence interval Notice that as the number of trials in-creases the conf idence interval of the threshold de-creases After 25 trials the 95 confidence interval ofthe threshold extends 027 log units below the estimateof the threshold (indicated by the solid horizontal linenot by the filled circle which represents the stimulus

presented on that trial) and 037 log units above it for atotal confidence interval of 064 log units

EXPERIMENT 1Comparison of Three Psychophysical Methods

The goal of Experiment 1 was to compare the ML-PEST method with two commonly used methods withrespect to their efficiency and their precision Detectionthresholds for butyl alcohol were measured using threemethods the ascending method of limits as described by Cain et al (1983) the upndashdown staircase method(Wetherill amp Levitt 1965) and the ML-PEST maximum-likelihood staircase method as described above (Harvey1986 1997)

MethodSubjects Thirty-two (16 males and 16 females) healthy non-

smoking subjects participated in the experiment Their mean age was304 years (plusmn 81) ranging from 17 to 60 years None reported hav-ing conditions that are normally associated with smell dysfunction

Stimuli for ascending method of limits For the ascendingmethod of limits the procedure described by Cain et al (1983) wasused Thirteen concentrations of butyl alcohol were prepared withdistilled water as diluent The strongest concentration was 4 Ad-jacent steps differed by a factor of 3

Stimuli for staircase methods The Wetherill and Levitt andML-PEST methods used a series of 24 concentrations of butyl al-cohol The lowest 20 concentrations were a factor of 15 apart withthe strongest 01 This series was augmented by the four strongestof the series for the ascending method which were separated by afactor of 3

Procedure Thresholds were determined with a temporal 2AFCprocedure In each trial a stimulus was paired with a blank and the

c 2 2 0 2 0= times = timesaeligegraveccedil

oumloslashdivide

log ( ) log max

e exL

Llikelihood ratio

Figure 4 Stimulus presentation as a function of trial number in the two-alternative forced-choice Monte Carlo simulation Trials on which the subjectrsquos response was correct are repre-sented by the filled circles trials on which the subjectrsquos response was incorrect are representedby the open circles The resulting maximum-likelihood estimate of a is plotted by the solid lineThe horizontal dotted line marks the true value of a

1336 LINSCHOTEN HARVEY ELLER AND JAFEK

subjects were asked to select the bottle containing the stimulus Thesubjects were asked to give their best guess even when they werecertain that the two stimuli were identical The stimulus and theblank were presented in 250-ml squeezable polyethylene bottlescontaining 60 ml of liquid The subjects were instructed to keep onenostril closed while the other nostril was tested with three puffs outof each bottle For each subject four thresholds were obtained twofor each nostril In randomized order and alternating nostrils onethreshold was determined with the ascending method one with theupndashdown method and two with the adaptive maximum-likelihoo dmethod Testing took about 45 min

The definition of detection threshold differs in several aspectsamong the three methods In the ascending method testing startedwith the lowest concentration A correct response led to the pre-sentation of the same concentration on the next trial When an in-correct response was given the next higher concentration was usedTesting stopped when five correct answers had been given in a rowThe threshold was defined as the last-used concentration (Cain et al1983) and thus corresponded to a detection probability of 10

In the upndashdown staircase method testing could start at any stepin the concentration series The first subject was started in the mid-dle of the range and each subsequent subject started where the for-mer subject had ended After one correct response the same con-centration was presented if two correct responses were given thenext lower concentration was presented on the next trial One in-correct response led to the presentation of a concentration one stephigher This decision rule is called the one-upndashtwo-down rule Al-though this decision rule is widely used the actual threshold esti-mate depends more on the size of the stimulus step than on the par-ticular upndashdown rule (Garcia-Perez 1998) Testing stopped afterfive reversals and the threshold was computed as the average of the

stimulus concentrations at the last four reversals This thresholdcorresponded to a detection probability of approximately 75

In the maximum-likelihood adaptive staircase method testingstarted in the middle of the stimulus range After each response theestimate of the threshold was calculated and the next stimulus wasa concentration close to that threshold Testing stopped when theconfidence interval of the estimated threshold reached a predefinedcriterion (08 log concentration units) This threshold corresponde dto a detection probability of exactly 75 All these computationswere carried out in real time on a Power Macintosh computer usinga program SensoryTester written by author LOH

Data analysis The mean threshold (and standard error) pro-duced by each method is presented in Table 1 labeled ldquoNativeThresholdrdquo As noted above each method produces a threshold es-timate that corresponds to a different performance level of detec-tion To obtain threshold and confidence interval estimates for theascending method of limits and the upndashdown staircase method thatcould be directly compared with those obtained by the ML-PESTmethod a logistic psychometric function was fitted to the actual se-quence of stimulus presentations and responses from these twomethods The values of b and g were held constant at the same valuesused in the ML-PEST method The maximum-likelihood thresholdand the confidence interval were then computed in exactly the samemanner as with the ML-PEST method This f itting procedure willgive a lower threshold for the ascending method of limits data thanthat reported by the method of limits procedure itself because theascending method of limits looks for a threshold stimulus giving100 correct whereas the maximum-likelihood threshold corre-sponds to 75 correct A repeated measures analysis of variance(ANOVA) was computed using StatView 501 (SAS Institute 1998a1998b) on each of four dependent variables native log threshold

Figure 5 Stimulus presentation as a function of trial number in the two-alternative forced-choiceMonte Carlo simulation Trials on which the subjectrsquos response was correct are represented by thefilled circles trials on which the subjectrsquos response was incorrect are represented by the open cir-cles The resulting maximum-likelihood estimate of a is plotted by the solid line The horizontaldotted line marks the true value of a The vertical bars show the 95 confidence interval of eachestimate of a

FAST AND ACCURATE THRESHOLDS 1337

number of trials to reach the native threshold maximum-likelihoo dlogistic threshold and confidence interval of the maximum-likelihoo dthreshold

ResultsThe native threshold values the mean numbers of tri-

als needed to reach the stopping criterion the maximum-likelihood logistic thresholds and the mean confidenceintervals of the maximum-likelihood thresholds are sum-marized in Table 1 A repeated measures ANOVA on thenative thresholds showed no significant main effect ofmethod [F(331) = 161 p lt 187] Subsequent statisticalcontrast comparisons using the standard HuynhndashFeldt eto correct the p value and the degrees of freedom re-vealed that the native threshold of the ascending methodof limits was just signif icantly higher than the othermethods [F(131) = 392 p = 051] These higher thresh-olds are to be expected given that the ascending methodof limits computes the threshold to be the concentrationcorresponding to a detection probability of 10 whereasthe other methods use a performance criterion of about 75

A repeated measures ANOVA on the number of trialsneeded to reach the stopping criterion showed a signifi-cant main effect of method [F(331) = 626 p lt 0008]Statistical contrast comparisons showed that the ascend-ing method required significantly fewer trials to deter-mine threshold [F(131) = 641 p lt 014] and that theupndashdown method required signif icantly more trials[F(131) = 606 p lt 017] than the average of the twomeasurements with the maximum-likelihood method

A repeated measures ANOVA on the maximum-like-lihood logistic thresholds showed no significant main ef-fect of method and the thresholds computed from thetrial-by-trial data from the ascending method of limitswere not significantly different from the thresholds com-puted from the data collected with the other methods[F(131) = 177 p = 19] This result is to be expectedsince presumably the performance of each observer isdriven by the same sensory process when tested by eachof the three psychometric methods

The mean confidence intervals for the thresholds (andstandard errors) are shown in the last column of Table 1The mean confidence interval of the maximum-likelihoodmethod is smaller than those of the other two Note thatthe standard error of the confidence interval reported forthe ML-PEST method is very small (002) This is be-cause the confidence interval was used as the stopping

criterion Trials were stopped when the confidence in-terval of a reached 08 A repeated measures ANOVAindicated that there was a signif icant main effect ofmethod [F(331) = 1727 p lt 0002] Contrasts tests re-vealed that the confidence intervals for the upndashdownmethod were significantly larger than those for both themaximum-likelihood method [F(131) = 5093 p lt0001] and the ascending method [F(131) = 2372 p lt0016]

DiscussionThe native absolute threshold values obtained with the

ascending method were higher than those obtained withthe other two methods One should remember that thesevalues are not conventional thresholds in the sense thatthey represent the concentration chosen correct five outof five times It is to be expected that this value will bea higher concentration than thresholds based on a lowerpercent correct Furthermore this method at least as de-scribed by Cain et al (1983) has considerably larger stepsbetween adjacent concentrations and only actual stimu-lus concentrations can be threshold values

The ascending method needed the fewest number oftrials (~13) relative to either the maximum-likelihoodmethod (~15) or the upndashdown staircase method (~17)The clear superiority of the maximum-likelihood methodhowever came with the high precision of the measuredthresholds as indicated by the small confidence inter-vals The confidence intervals of the upndashdown staircasewere almost three times larger than those given by themaximum-likelihood method

MONTE CARLO SIMULATIONS

When measuring the threshold of an actual subject itis not possible to know what the error is between themeasured value and the ldquotruerdquo value because the truevalue is not known In order to evaluate the accuracy ofthe three psychophysical methods we resorted to MonteCarlo evaluations of simulated observers having a pre-determined threshold value We used each of the threepsychophysical methods to measure the threshold ofeach simulated observer and compared the result withthe true value

Monte Carlo simulations were run for each of thethree psychophysical procedures using the same rulesand stimulus series as were used in the threshold mea-

Table 1Mean Performance (and Standard Error) of Three Psychophysical Methods

With Butyl Alcohol in Experiment 1

Native Number of ML Logistic Confidence Threshold Trials Threshold Interval

Method M SE M SE M SE M SE

ML-PESTLeft 341 016 1513 065 341 016 080 002Right 327 013 1541 076 327 013 084 002

WetherillndashLevitt 330 010 1744 072 330 011 232 034Ascending limits 310 013 1303 068 348 014 114 002

1338 LINSCHOTEN HARVEY ELLER AND JAFEK

surements for butyl alcohol with the real subjects re-ported above in Experiment 1 A population of 10000simulated subjects was defined whose thresholds fol-lowed a Gaussian distribution with a mean threshold of

300 log percent concentration and a standard devia-tion of 04 log units These values are representative ofthe distribution of real subjectsrsquo detection thresholds forbutyl alcohol as found in Experiment 2 below The pre-determined threshold for each of the 10000 simulatedobservers was drawn from the above distributionThreshold values higher than the highest stimulus con-centration and lower than the lowest stimulus concentra-tion were not used

Simulation 1On each trial the simulated subject was presented

with a stimulus Each subjectrsquos detection behavior wasdriven by a logistic psychometric function with a steep-ness of 35 and a chance level performance of 05 Theprobability of making the correct response on a trial wasdetermined by the probability predicted by the logisticfunction for the stimulus and for that subject and a uni-form random number between 0 and 1 If the randomnumber was less than the logistic function probability forthat stimulus the response was scored as correct other-wise it was scored as wrong The starting stimulus forthe upndashdown staircase and maximum-likelihood meth-ods was chosen randomly between the two stimuli thatbracketed the mean threshold of 30 log concentrationThese stimuli were 3113 and 2937 log concentra-tion The ascending method started with the lowest stim-ulus in its series 5130 log percent concentration

An ascending limits run was considered a failure if theprocedure called for a stimulus stronger than the higheststimulus concentration A WetherillndashLevitt staircase runwas considered a failure if during the trials a stimulusstronger than the highest stimulus concentration orweaker than the lowest stimulus concentration was calledfor In the present simulation the WetherillndashLevitt stair-

case failed 66 times All of these failures were caused bythe procedurersquos calling for a lower stimulus concentra-tion than the lowest available An ML-PEST run wouldhave been considered a failure if the final estimate of thethreshold were more than half the stopping confidenceinterval below the lowest candidate or above the highestcandidate a There were no such failures The failed runsare excluded from the data analyzed below

The distribution of errors for each method (estimatedthreshold minus true threshold) for the 10000 subjectsis shown in the left side of Figure 6 The distribution ofthe number of trials to reach stopping criterion for the10000 subjects is shown on the right side of Figure 6The maximum-likelihood method was more accuratethan the WetherillndashLevitt staircase and both of thesemethods were superior to the ascending method of lim-its The median error and the number of trials requiredfor each method are shown in Table 2 The median errorfor the maximum-likelihood method was almost half aslarge as that for the upndashdown staircase although both er-rors were quite small

The error on each of the 10000 subjects is plotted inFigure 7 as a function of the number of trials for that runThe range of errors became smaller as the number of tri-als increased for both the WetherillndashLevitt staircasemethod and the ML-PEST method and the median errorstayed close to zero The ascending staircase methodshowed quite different behavior The direction of theerror depended on the number of trials before the stop-ping criterion of five correct in a row had been reachedThe estimated thresholds were too low when the numberof trials was less than about 15 whereas the estimates ofthreshold were too high when the number of trials wasmore than 15

Simulation 2Because the WetherillndashLevitt staircase method and the

ML-PEST method generally required different numbers oftrials before the stopping criterion was reached on eachrun comparison of the accuracy is not straightforwardWe repeated the Monte Carlo simulations so that the stair-case method and the ML-PEST method used the samenumber of trials on each run First the WetherillndashLevittstaircase was run until the stopping criterion wasreached Then the ML-PEST method was run with thesame number of trials on that observer using the samestarting stimulus

The results are presented in Table 2 When the ML-PEST method had the same number of trials as theupndashdown staircase the ML-PEST method was slightlymore accurate As in Simulation 1 both of these meth-ods were superior in accuracy to the ascending staircaseThere were 60 failures out of the 10000 cases where theupndashdown staircase called for a stimulus that was lowerthan the lowest available concentration Because theML-PEST method was not permitted to run as many tri-als as it would normally require this method failed on 30out of the 10000 cases

Table 2Results of the Four Monte Carlo Simulations

Method Median Error Median Number of Trials

Simulation 1Maximum-likelihood 0027 16WetherillndashLevitt 0045 17Ascending limits 0331 14

Simulation 2Maximum-likelihood 0094 17WetherillndashLevitt 0101 17Ascending limits 0419 14

Simulation 3Maximum-likelihood 003 16WetherillndashLevitt 360 16Ascending limits 354 16

Simulation 4Maximum-likelihood 275 115WetherillndashLevitt 093 21Ascending limits 336 22

FAST AND ACCURATE THRESHOLDS 1339

Simulation 3An important ability of sensory testing is to detect

people who deliberately try to appear to have diminishedsensitivity We therefore repeated the Monte Carlo sim-ulations of 10000 subjects using the same starting con-ditions and the same test conditions as those in Simula-tion 1 but each simulated subject generated ldquountruthfulrdquoresponses (ie responses just the opposite of that calledfor by the subjectrsquos psychometric function) In the ML-

PEST computer program two likelihood functions aremaintained one based on the hypothesis that the subjectis responding truthfully and the other based on the hy-pothesis that the subject is deliberately choosing thewrong response The program explicitly tests the hy-pothesis that a subject is deliberately choosing the wrongresponse by comparing the maxima of the two likelihooddistributions and therefore should be able to measure a sensory threshold as effectively as when a subject is

Figure 6 Distribution of errors (left) and number of trials (right) for three psychophysical methods maximum-likelihoodadaptive staircase (upper panel) Wetherill and Levitt upndashdown staircase (middle panel) and ascending method of limits (lowerpanel) The vertical line in the left panels marks the zero-error bin of each histogram See text for details The abscissa scalefor the upper two left panels covers a range of 15 log units and is 5 log units for the lower left histogram

1340 LINSCHOTEN HARVEY ELLER AND JAFEK

Figure 7 Error as a function of number of trials for three psychophysicalmethods ML-PEST (upper panel) upndashdown staircase (middle panel) and as-cending method of limits (lower panel)

FAST AND ACCURATE THRESHOLDS 1341

telling the truth The other two methods do not have anexplicit way of dealing with this situation It is of inter-est to learn how all three methods behave under thesecircumstances

The results of this simulation are presented in Table 2The ML-PEST method successfully measured the thresh-old and explicitly detected the lying condition in all10000 cases and was therefore able to measure thethreshold of the underlying psychometric function witha small median error The upndashdown staircase method andthe ascending method of limits reached the highest con-centration and tried to go higher resulting in 9824 fail-ures for the staircase method and 8648 failures for themethod of limits If we consider a failure as a successfuldetection of the lying behavior the upndashdown staircasemethod detected 98 of the cases and the ascendingmethod of limits detected 86 of the cases These twomethods required about the same number of trials (seeTable 2)

Simulation 4It is important in clinical testing that people with ele-

vated thresholds be easily detected One of the subjects inExperiment 1 f irst had smell tested by the upndashdownmethod and exhibited a normal threshold Later the samenostril was tested using the maximum-likelihood methodand no threshold could be measured because as it turnedout he was anosmic on that side The fact that the upndashdown method did not detect the anosmia is of concern

We therefore repeated the Monte Carlo simulations of10000 subjects using the same starting conditions andthe same test conditions as those in Simulation 1 withone change Each simulated subject generated a randomresponse as would be the case if a person had no sensorysensitivity The ideal behavior of a testing method wouldbe to indicate that the threshold was at the highest valuethat is possible with that method or to have some otherexplicit way of indicating that the subject is respondingrandomly

None of the three methods performed very well in de-tecting the random responding The median errors andnumber of trials are given in Table 2 The upndashdown stair-case method did the poorest because it gives a medianerror of 93 This rather low error occurs because on87 of the cases the stopping criterion of 5 reversalswas satisfied before the method reached the upper limitof the stimulus concentrations The ML-PEST methodfailed on 11 of the cases but overall the thresholdswere estimated to be very high Unfortunately it re-quired a great number of trials to achieve this perfor-mance The ascending method of limits reached theupper limit of stimulus concentration on 66 of thecases and also gave the highest values for the threshold

Since in Monte Carlo simulations the distribution ofthe actual thresholds is known we can apply an SDT analy-sis (Green amp Swets 19661974 Macmillan amp Creelman1991) to compare the distribution of estimated thresh-olds made by each method under random respondingwith the true distribution Using the means and standard

deviations of the above distributions we computed thesignal detection measures of sensitivity da and accuracythe area under the ROC curve Az (Harvey 1992 Simp-son amp Fitter 1973) The ML-PEST method had the high-est sensitivity and accuracy (da = 227 Az = 95) The as-cending method was next best (da = 199 Az = 92) andthe upndashdown staircase method was the least sensitive(da = 150 Az = 86)

DiscussionThe results of the simulations lead to the conclusion that

both the upndashdown staircase method and the maximum-likelihood method have considerable strengths for use inmeasuring taste and smell thresholds The ascendingmethod of limits should not be used if one is interestedin obtaining an accurate estimate of the threshold Thesethreshold estimates covary with the number of trialstaken to stop as is illustrated in the lower panel of Fig-ure 7 The simulation results offer an explanation for thepublished finding that ascending limit measurement ofolfactory thresholds are less reliable than staircase mea-sured thresholds (Doty McKeown Lee amp Shaman1995)

The upndashdown staircase method gives an average errortwice as large as that of the ML-PEST method althoughboth values are small and therefore similar to eachother The upndashdown staircase method failed by callingfor a stimulus too low on 66 out of 10000 cases whereasthe ML-PEST method had no such failures Althoughthe median number of trials gives the edge to the ML-PEST method on some occasions the ML-PEST methodrequired over 50 trials to reach its stopping criterion

A strength of the ML-PEST method is its ability toconsider alternate hypotheses about the response strat-egy of the observer The ML-PEST method was able tocorrectly detect all 10000 dishonest responders We arecurrently preparing a paper that will go into considerabledetail on how to distinguish malingerers from true anos-mics using a variety of methods Another strength is thatthere is virtually no bias introduced into threshold mea-surements by having the ML-PEST steepness differentfrom the ldquotruerdquo value set in the Monte Carlo observer(Treutwein amp Strasburger 1999) Data generated from aMonte Carlo Weibull function observer can be well fit byvirtually any s-shaped psychometric function althoughthe Weibull will usually fit slightly better We thereforerecommend the ML-PEST method for testing taste andsmell

EXPERIMENT 2TestndashRetest Reliability

In order to assess the stability of the measures ob-tained with the adaptive maximum-likelihood procedurewe tested the subjects on four occasions

MethodSubjects Twenty-seven nonsmoking healthy subjects partici-

pated in the experiment (mean age = 316 plusmn 87 years range =

1342 LINSCHOTEN HARVEY ELLER AND JAFEK

22ndash57 years) The 14 women and 13 men were divided into twogroups Group 1 (ldquoconsecutive rdquo 7 females and 7 males) was testedon 4 consecutive days and Group 2 (ldquodistributed rdquo 7 females and 6males) was tested on Days 1 2 8 and 22

Stimuli Smell thresholds for butyl alcohol were assessed usingthe same range of concentrations as were used with the maximum-likelihood method in Experiment 1 Taste thresholds were measuredusing 20 concentrations of NaCl in quarter-log steps The highestconcentration was 17 M The same maximum-likelihood adaptivemethod as described in Experiment 1 was used in this experiment

Procedure Thresholds were determined with a temporal 2AFCprocedure Taste sensitivity was measured with a whole-mouth sip-and-spit procedure The stimuli and the blanks were presented in30-ml plastic medicine cups containing 5 ml of liquid The sub-jects were requested to take the f irst sample in their mouths swishit around and spit it out then rinse with distilled water and do thesame with the second sample They were asked to choose the sam-ple that had a taste different from water In each of the four sessionstaste thresholds were determined first and smell thresholds last Allfour smell thresholds were determined in the preferred nostril Thesubjects were tested at the same time of each day

ResultsBecause the measurement and specification of testndash

retest reliability is a rather complex topic (Linn 1989)

we have chosen four different ways to demonstrate thereliability of the maximum-likelihood method For thefirst method the four threshold measurements for eachsubject are plotted in Figure 8 The thick line in eachpanel is the mean threshold It is clear from examinationof Figure 8 that most individual thresholds were stableover 4 days (left column of panels) and over 22 days(right column of panels) with a few subjects showingshifts of more than a 1 log unit The mean thresholds forthe two groups are shown in Table 3 A repeated mea-sures ANOVA confirmed the stability of the thresholdsThe effect of testing day was not significant

A second approach is to compute the correlationsamong the four threshold measurements (see eg Mattes1988) Pearson correlation coefficients between thresh-olds obtained on different days are shown in Table 4 Thecorrelations were generally higher for pairs of thresholdsthat were measured close together in time As Mattespointed out correlations are highly influenced by smallchanges in ordinal ranking of the subjects The fact thatthe pairwise correlations became smaller as time sepa-ration between them increased is shown in Figure 9 The

Figure 8 Log threshold concentration as a function of test day for NaCl (upper row) and butanol (lower row) measured on 4 suc-cessive days (left column) and over 22 days (right column) The thin lines connect the four thresholds of individual subjects The thickline is the mean threshold concentration of all subjects

FAST AND ACCURATE THRESHOLDS 1343

solid line is the least squares regression line It indicatesthat the correlation between two thresholds diminishedas the time between them increased The slope of the re-gression line becomes even steeper if the data for 1-dayseparation are removed from the regression indicatingthat the decrease in correlation started with the 2-dayseparation condition

The third method to assess testndashretest reliability isbased on the differences between successive thresholdmeasurements Each threshold was subtracted from theprevious threshold The four threshold measurements re-sulted in three difference measurements The distribu-tions of these differences for NaCl and for butyl alcoholare plotted on the left side of Figure 10 If the measuredthresholds do not systematically increase or decreaseover time we expect that the mean difference would bezero The mean of the distribution for NaCl (upper panelof Figure 10) was 0018 and for butyl alcohol (lowerpanel) was 0002 Neither of these was significantly dif-ferent from zero The standard deviations of these distri-butions were 0413 log units for NaCl and 0422 log unitsfor butyl alcohol

A fourth way to examine testndashretest reliability is bymeans of the spread of the four threshold measures foreach subject A standard deviation of 00 would meanthat all four thresholds were exactly the same value Thestandard deviation of the four measurements was com-puted for each subject The distribution of these standarddeviations for NaCl (upper panel) and butyl alcohol(lower panel) are plotted on the right side of Figure 10The median standard deviation was about 015 log unitsfor both NaCl and butyl alcohol This good testndashretestreliability is consistent with the other three methods ofassessing reliability reported above

DiscussionThe maximum-likelihood method gives estimates of

thresholds that are stable within individuals over timeand that are responsive to individual differences By allfour indicators the testndashretest reliability of both NaCland butyl alcohol thresholds measured by the maximum-likelihood method was very good

GENERAL DISCUSSION

The two experiments and the Monte Carlo simulationsreported here demonstrate that the maximum-likelihoodmethod is a viable alternative to other methods currentlyin use to measure taste and smell detection thresholdsBecause the method attempts to use stimuli that are op-timal (close to the predicted threshold) the confidenceintervals of the measured thresholds are considerablysmaller than those achieved by the other two methods(Experiment 1)

One of the strengths of the maximum-likelihoodmethod is that it provides an explicit way of estimatingthe confidence interval of the threshold a facility notprovided by the other two methods The experimenter isfree to establish a stopping confidence interval to suithisher needs If greater precision is desired it can beachieved at the expense of running more trials This ex-plicit tradeoff between precision and number of trials isnot possible with the other two psychophysical methodsOne could of course use a hybrid combination of meth-ods an upndashdown staircase method for generating the nextstimulus to use combined with a maximum-likelihoodmethod for estimating the threshold and its confidenceinterval In effect we used this approach after the datafrom Experiment 1 were collected by reanalyzing them

Table 3Mean NaCl and Butyl Alcohol Thresholds (and Standard Errors) From Experiment 2

Day 1 Day 2 Day 3 Day 4

M SE M SE M SE M SE

NaClConsecutive (Group 1) 262 013 267 011 282 013 263 011Distributed (Group 2) 290 019 283 009 283 011 277 010

Butyl AlcoholConsecutive (Group 1) 319 014 317 015 325 012 310 013Distributed (Group 2) 300 016 310 014 309 010 309 011

Table 4Pearson Correlation Coefficients Between Thresholds

Paired Sessions

1ndash2 1ndash3 1ndash4 2ndash3 2ndash4 3 ndash 4

NaClConsecutive (Group 1) 46 66 66 44 75 66Distributed (Group 2) 58 72 40 55 20 51Butyl alcoholConsecutive (Group 1) 86 83 86 67 71 76Distributed (Group 2) 26 68 56 51 21 65

p lt 05 p lt 01 p lt 001

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 2: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

FAST AND ACCURATE THRESHOLDS 1331

the psychometric function Typically that performancepoint is halfway between chance and 100 correct per-formance The 2AFC function has a halfway point of75 In the data shown in Figure 1 no stimulus gave aperformance of exactly 75 correct and this value musttherefore be estimated by some suitable means A logis-tic psychometric function (Berkson 1951 1953 1955)shown as the smooth curve was fit to the data using amaximum-likelihood technique (Harvey 1986 1997Treutwein amp Strasburger 1999) The logistic function isgiven by

(1)

The function is specified by three parameters a (thestimulus at the halfway point) b (the steepness of thefunction) and g (the probability of being correct bychance) The stimulus intensity corresponding to a ismarked by the vertical line in Figure 1 and represents thestimulus that would achieve 75 correct in the 2AFCtask (marked by the horizontal line) Even though thesensory process as formulated in SDT has no sensorythreshold the value of a is still widely referred to as thethreshold In this paper the term threshold will be usedto mean the stimulus value a that gives performancehalfway from chance to 100

With the above framework in mind it can be seen thatthe goal of all psychophysical methods for measuring

thresholds and sensitivity from the classical methods ofFechner (Guilford 1936 1954) to modern adaptive stair-case methods (Cornsweet 1962 Harvey 1986 1997Treutwein amp Strasburger 1999 Watson amp Pelli 1983Wetherill amp Levitt 1965) is to estimate the value of aIn vision and audition it is practical to measure sensitivitywith large numbers of trials using computer-generatedstimuli In the chemical senses however the physicalpresentation of the stimulus is not easily accomplishedwithout human intervention Furthermore the longer re-covery time in the chemical senses prevents rapid suc-cessive presentation of stimuli These facts limit the totalnumber of psychophysical trials that can be presented ina testing session before fatigue andor boredom set inWe are therefore faced with the question ldquoHow can webest estimate the threshold with a limited number of ex-perimental trialsrdquo A reasonable solution to this problemwould be especially valuable for clinical testing of tasteand smell function and for experiments in which sensitiv-ity is repeatedly measured over a period of days or weeks

A review of the taste and smell literature reveals thatthe two most widely used methods to measure detectionsensitivity are the ascending method of limits for smell(Apter Gent amp Frank 1999 Cain Gent Catalanotto ampGoodspeed 1983 Cometto-Muntildeiz amp Cain 1995 Deemset al 1991 Duffy Cain amp Ferris 1999 Gagnon Mer-gler amp Lapare 1994 Lehrner Bruumlcke Dal-Bianco Gat-terer amp Kryspin-Exner 1997 Lehrner Kryspin-Exneramp Vetter 1995 Moll Klimek Eggers amp Mann 1998Murphy Nordin de Wijk Cain amp Polich 1994 Nordinet al 1996 Pierce Doty amp Amoore 1996 Rosenblatt

P xx

( ) = + ( ) times

+ aeligegrave

oumloslash

aelig

egrave

ccedilccedilccedilccedil

ouml

oslash

dividedividedividedivide

g g

ab

1 1

1

Figure 1 Typical S-shaped psychometric function for stimulus detection in a two-alternativeforced-choice paradigm The data were generated by a Monte Carlo logistic observer havingan a of 10 stimulus intensity units a b of 35 and a g of 05 Each of the 20 stimulus intensi-ties was presented 1000 times The solid line is the best-fitting logistic function fit to the datausing a maximum-likelihood curve-fitting technique (Harvey 1986 1997)

1332 LINSCHOTEN HARVEY ELLER AND JAFEK

Olmstead Iwamoto-Schaapp amp Jarvik 1998) and theone-upndashtwo-down variant of the Wetherill and Levittstaircase method (Wetherill amp Levitt 1965) for taste(Anliker Bartoshuk Ferris amp Hooks 1991 Cowart1989 Cowart Yokomukai amp Beauchamp 1994 Drew-nowski Henderson amp Shore 1997 Grushka Sessle ampHowley 1986 Murphy amp Cain 1986 Murphy et al1994 Stevens 1995 Weiffenbach Baum amp Burghauser1982 Weiffenbach Schwartz Atkinson amp Fox 1995)The method of constant stimuli could be used to measurea complete psychometric function like the one shown inFigure 1 using a fixed number of stimuli presented nu-merous times in random order Even though the shape ofthe function can be highly informative this method isonly rarely used in taste and smell research most likelybecause of the large number of experimental trials re-quired for stable estimates (Linschoten amp Kroeze 19911992 1994) A few experimenters have used an SDT par-adigm (yesndashno or rating scale judgments) to measure bothsensory sensitivity and response bias (Doty Snyder Hug-gins amp Lowry 1981 OrsquoMahony et al 1979)

In audition and vision adaptive psychophysical meth-ods are widely employed and the theoretical bases forthem are well understood To minimize the number of

trials the most efficient testing strategy is to compute anestimate of the threshold after each psychophysical trialand use a stimulus close to the threshold on the next trial(Taylor amp Creelman 1967) The estimation of the thresh-old is done by fitting a psychometric function to the datacollected up to that point in the experiment and estimat-ing the value of the threshold from the best-fitting func-tion This method is formally called a maximum-likelihoodadaptive staircase method and is known by several dif-ferent names Best PEST (Pentland 1980) QUEST (Wat-son amp Pelli 1983) and ML-PEST (Harvey 1986 1997)We will refer to it as the ML-PEST method (maximum-likelihood parameter estimation by sequential testing)The term PEST was originally coined by Taylor andCreelman (1967)

That good estimates of the threshold can be obtainedfrom sparse data is shown in Figure 2 where psychome-tric functions for the same 20 stimuli as in Figure 1 butwith 1000 100 10 and 1 presentations per stimulus areshown As the number of trials per stimulus intensity de-creases the empirical psychometric function becomesmore and more irregular The solid curve in each figureis the best-fitting logistic function (best value of a wasfound holding b and g constant) obtained with a maximum-

Figure 2 Typical S-shaped psychometric function for stimulus detection in a two-alternative forced-choice paradigm The datawere generated by a Monte Carlo logistic observer having an a of 10 stimulus intensity units a b of 35 and a g of 05 Each of the 20stimulus intensities was presented 1000 100 10 and 1 times The solid line is the best-fitting logistic function fit to the data using amaximum-likelihood curve-fitting technique (Harvey 1986 1997)

FAST AND ACCURATE THRESHOLDS 1333

likelihood fitting technique (Harvey 1986 1997) Evenin the extreme case of 1 presentation per stimulus thebest-fitting logistic function gives a good estimate of thethreshold

Our goal in the present paper is to introduce the use ofthe maximum-likelihood adaptive staircase procedurefor measuring sensitivity in the chemical senses We willcompare the efficiency and the accuracy of this methodwith those of other methods and present testndashretest dataas an assessment of reliability

MAXIMUM-LIKELIHOOD ADAPTIVESTAIRCASE PSYCHOPHYSICAL METHOD

The principles of this method are described in a vari-ety of sources (Hall 1968 1981 Harvey 1986 1997Hays 1963 Watson amp Pelli 1983) which provide moredetail than is presented here For this discussion it is as-sumed that the psychometric function is logistic and thatdata are collected with a 2AFC experimental paradigmalthough other functions and paradigms could certainlybe used During psychophysical testing there are threemajor questions that are answered by this method aftereach trial (1) What is the current estimate of the thresh-old (2) What stimulus should be presented to the sub-ject on the next trial (3) Should the psychophysical tri-als be stopped Although the answers are somewhatinterrelated it is useful to consider them separately

Estimation of the ThresholdDuring psychophysical testing the raw data collected

on each trial are the stimulus concentration presented tothe subject and whether or not the subject made a correctresponse In order to estimate the threshold a series ofpossible psychometric functions is considered A likeli-hood function is constructed by computing the log like-lihood for each of these possible psychometric functionseach one having the same value of b (steepness) and g(the probability of being correct by chance) and eachhaving a different value of a The computational proce-dure is described by Harvey (1986 1997) The smallestcandidate a is chosen to lie below any possible realthreshold or at least lower than the lowest physical stim-ulus Successive values increase in 001-log-unit stepsup to an appropriate maximum In testing the detectionof NaCl for example we consider 451 as ranging from

450 log molar concentration (log M) to 000 log molarconcentration in 001-log steps Each candidate logisticfunction has a fixed b (35) and a fixed g (05) for the2AFC psychophysical paradigm The b value is not crit-ical for converging on the correct value of b The valueof 35 is close to actual psychometric functions for NaClmeasured with the method of constant stimuli The like-lihood functions after having run 0 5 10 15 20 and 25trials on a Monte Carlo observer with a true value of aof 2125 log M (00075 M) are plotted in the upperpanel of Figure 3 The best estimate of the threshold is

that value of a having the maximum likelihood (ie themean of the likelihood distribution) In this simulationthe best estimate of a was 2110 log M after 25 trialsan error of 0015 log M

Choosing the Next StimulusAt the beginning of an experiment before any data have

been collected all candidate as have equal likelihood ofbeing the threshold But often the experimenter has someprior idea of what the threshold value should be espe-cially if the subject has normal sensory function We in-corporate these prior expectations into the method bykeeping track of a second likelihood function that startswith the prior likelihoods and adds the results of each trialto it We use a Gaussian distribution for the prior likeli-hood function with a standard deviation of 040 which isthe value we found in our testndashretest experiment to be re-ported below (Experiment 2) In the example the initialestimate of the threshold was 1875 log M These pos-terior likelihood functions are plotted in the lower panel ofFigure 3 after having run 0 5 10 and 15 trials on the sameobserver The mode of this posterior likelihood functionafter each trial is used to choose the next stimulus

In taste and smell it is usually not feasible to maintainmore than 20 or so different stimulus concentrations forpresentation to an observer The stimulus recommendedby the maximum of the posterior likelihood functiontherefore may not actually be available to the experi-menter to use on the next trial There will be an actualstimulus that is lower than the recommendation and an-other stimulus that is higher than the recommendationTo choose between these two we compute relative dis-tance of the recommended stimulus between the two ac-tual stimuli and we compute the relative frequency ofeach of the two actual stimuli We choose one randomlywith a probability that favors the stimulus closer to therecommendation and the stimulus with the fewer numberof trials This computation is done in CStim-gtGetBest-StimulusIndex() in the ML-PEST software packageKeeping the list of candidate as separate from the list ofavailable stimuli means that the threshold a can be com-puted to a precision that is not limited by the small num-ber of actual stimuli used in the experiment

For NaCl detection we use 18 different concentrationsranging from 400 log M to 025 log M in 025-log-unitsteps The actual stimulus concentrations that were pre-sented to the Monte Carlo observer are plotted in Fig-ure 4 for each of the 25 trials The solid circles representtrials on which the observer was correct the open circlesare incorrect trials Because the observer was correct onthe first three trials the stimuli that were presented weresuccessively weaker concentrations On Trial 4 an errorwas made and the stimulus on the following trial wasthe same concentration Generally but not always (be-cause of the random process of choosing the next stimu-lus) the next stimulus will be weaker after a correct re-sponse and stronger after an incorrect response The

1334 LINSCHOTEN HARVEY ELLER AND JAFEK

maximum-likelihood estimate of the threshold is plottedby the solid line in Figure 4 Until both correct and in-correct trials are collected the estimate of the thresholdwill be at either the lower or the upper end of the rangeof candidate as In this example the observer was cor-rect on the first three trials and then made a mistake onTrial 4 The estimate of the threshold was extremely low(not visible in the figure) until Trial 4 when it jumped upto the region near the true value As the trials continuethe estimated threshold fluctuates around the true valueand eventually converges on it After 25 trials the esti-mated threshold was 211 log M an error of 0015

log units This low error is achieved even though no ac-tual stimulus corresponding to the threshold was avail-able for testing

When to StopTwo stopping criteria can be used to terminate psy-

chophysical testing stop after a fixed number of trials orstop after the estimation of the threshold reaches a re-quired level of precision Precision is measured by theconfidence interval which is related to the width of thelikelihood function Notice in Figure 3 that the likeli-hood function becomes narrower as the number of trials

Figure 3 Log likelihood (upper panel) and log posterior likelihood (lower panel) as a function ofcandidate a following 0 5 10 15 20 and 25 Monte Carlo trials with a logistic observer having atrue a of 00075 M ( 2125 log M) detecting NaCl in a two-alternative forced-choice detectionparadigm

FAST AND ACCURATE THRESHOLDS 1335

increases The confidence interval of the estimatedthreshold is directly related to the width of the likelihooddistribution and is calculated using the likelihood ratiotest and the c2 statistical distribution with 1 degree offreedom (Hays 1963)

(2)

where Lx is the likelihood of a model whose a parameteris x and Lmax is the likelihood of the best-fitting modelWe search for the stimulus concentrations (x) above andbelow the threshold stimulus (the maximum-likelihooda) at which the log likelihood drops to c22 below themaximum log likelihood value To find the 95 confi-dence interval the criterion value of c2 is 50239 (thisvalue is obtained from standard tables of the c2 distribu-tion found in all statistic text books The one-tailed valueof 50239 corresponds to a probability of 025 since theconfidence interval is two-tailed 50239 corresponds toa probability of 05)

In Figure 5 the same data shown in Figure 4 are plot-ted with an expanded vertical scale to show the estimateof the threshold along with the estimated 95 confi-dence interval Notice that as the number of trials in-creases the conf idence interval of the threshold de-creases After 25 trials the 95 confidence interval ofthe threshold extends 027 log units below the estimateof the threshold (indicated by the solid horizontal linenot by the filled circle which represents the stimulus

presented on that trial) and 037 log units above it for atotal confidence interval of 064 log units

EXPERIMENT 1Comparison of Three Psychophysical Methods

The goal of Experiment 1 was to compare the ML-PEST method with two commonly used methods withrespect to their efficiency and their precision Detectionthresholds for butyl alcohol were measured using threemethods the ascending method of limits as described by Cain et al (1983) the upndashdown staircase method(Wetherill amp Levitt 1965) and the ML-PEST maximum-likelihood staircase method as described above (Harvey1986 1997)

MethodSubjects Thirty-two (16 males and 16 females) healthy non-

smoking subjects participated in the experiment Their mean age was304 years (plusmn 81) ranging from 17 to 60 years None reported hav-ing conditions that are normally associated with smell dysfunction

Stimuli for ascending method of limits For the ascendingmethod of limits the procedure described by Cain et al (1983) wasused Thirteen concentrations of butyl alcohol were prepared withdistilled water as diluent The strongest concentration was 4 Ad-jacent steps differed by a factor of 3

Stimuli for staircase methods The Wetherill and Levitt andML-PEST methods used a series of 24 concentrations of butyl al-cohol The lowest 20 concentrations were a factor of 15 apart withthe strongest 01 This series was augmented by the four strongestof the series for the ascending method which were separated by afactor of 3

Procedure Thresholds were determined with a temporal 2AFCprocedure In each trial a stimulus was paired with a blank and the

c 2 2 0 2 0= times = timesaeligegraveccedil

oumloslashdivide

log ( ) log max

e exL

Llikelihood ratio

Figure 4 Stimulus presentation as a function of trial number in the two-alternative forced-choice Monte Carlo simulation Trials on which the subjectrsquos response was correct are repre-sented by the filled circles trials on which the subjectrsquos response was incorrect are representedby the open circles The resulting maximum-likelihood estimate of a is plotted by the solid lineThe horizontal dotted line marks the true value of a

1336 LINSCHOTEN HARVEY ELLER AND JAFEK

subjects were asked to select the bottle containing the stimulus Thesubjects were asked to give their best guess even when they werecertain that the two stimuli were identical The stimulus and theblank were presented in 250-ml squeezable polyethylene bottlescontaining 60 ml of liquid The subjects were instructed to keep onenostril closed while the other nostril was tested with three puffs outof each bottle For each subject four thresholds were obtained twofor each nostril In randomized order and alternating nostrils onethreshold was determined with the ascending method one with theupndashdown method and two with the adaptive maximum-likelihoo dmethod Testing took about 45 min

The definition of detection threshold differs in several aspectsamong the three methods In the ascending method testing startedwith the lowest concentration A correct response led to the pre-sentation of the same concentration on the next trial When an in-correct response was given the next higher concentration was usedTesting stopped when five correct answers had been given in a rowThe threshold was defined as the last-used concentration (Cain et al1983) and thus corresponded to a detection probability of 10

In the upndashdown staircase method testing could start at any stepin the concentration series The first subject was started in the mid-dle of the range and each subsequent subject started where the for-mer subject had ended After one correct response the same con-centration was presented if two correct responses were given thenext lower concentration was presented on the next trial One in-correct response led to the presentation of a concentration one stephigher This decision rule is called the one-upndashtwo-down rule Al-though this decision rule is widely used the actual threshold esti-mate depends more on the size of the stimulus step than on the par-ticular upndashdown rule (Garcia-Perez 1998) Testing stopped afterfive reversals and the threshold was computed as the average of the

stimulus concentrations at the last four reversals This thresholdcorresponded to a detection probability of approximately 75

In the maximum-likelihood adaptive staircase method testingstarted in the middle of the stimulus range After each response theestimate of the threshold was calculated and the next stimulus wasa concentration close to that threshold Testing stopped when theconfidence interval of the estimated threshold reached a predefinedcriterion (08 log concentration units) This threshold corresponde dto a detection probability of exactly 75 All these computationswere carried out in real time on a Power Macintosh computer usinga program SensoryTester written by author LOH

Data analysis The mean threshold (and standard error) pro-duced by each method is presented in Table 1 labeled ldquoNativeThresholdrdquo As noted above each method produces a threshold es-timate that corresponds to a different performance level of detec-tion To obtain threshold and confidence interval estimates for theascending method of limits and the upndashdown staircase method thatcould be directly compared with those obtained by the ML-PESTmethod a logistic psychometric function was fitted to the actual se-quence of stimulus presentations and responses from these twomethods The values of b and g were held constant at the same valuesused in the ML-PEST method The maximum-likelihood thresholdand the confidence interval were then computed in exactly the samemanner as with the ML-PEST method This f itting procedure willgive a lower threshold for the ascending method of limits data thanthat reported by the method of limits procedure itself because theascending method of limits looks for a threshold stimulus giving100 correct whereas the maximum-likelihood threshold corre-sponds to 75 correct A repeated measures analysis of variance(ANOVA) was computed using StatView 501 (SAS Institute 1998a1998b) on each of four dependent variables native log threshold

Figure 5 Stimulus presentation as a function of trial number in the two-alternative forced-choiceMonte Carlo simulation Trials on which the subjectrsquos response was correct are represented by thefilled circles trials on which the subjectrsquos response was incorrect are represented by the open cir-cles The resulting maximum-likelihood estimate of a is plotted by the solid line The horizontaldotted line marks the true value of a The vertical bars show the 95 confidence interval of eachestimate of a

FAST AND ACCURATE THRESHOLDS 1337

number of trials to reach the native threshold maximum-likelihoo dlogistic threshold and confidence interval of the maximum-likelihoo dthreshold

ResultsThe native threshold values the mean numbers of tri-

als needed to reach the stopping criterion the maximum-likelihood logistic thresholds and the mean confidenceintervals of the maximum-likelihood thresholds are sum-marized in Table 1 A repeated measures ANOVA on thenative thresholds showed no significant main effect ofmethod [F(331) = 161 p lt 187] Subsequent statisticalcontrast comparisons using the standard HuynhndashFeldt eto correct the p value and the degrees of freedom re-vealed that the native threshold of the ascending methodof limits was just signif icantly higher than the othermethods [F(131) = 392 p = 051] These higher thresh-olds are to be expected given that the ascending methodof limits computes the threshold to be the concentrationcorresponding to a detection probability of 10 whereasthe other methods use a performance criterion of about 75

A repeated measures ANOVA on the number of trialsneeded to reach the stopping criterion showed a signifi-cant main effect of method [F(331) = 626 p lt 0008]Statistical contrast comparisons showed that the ascend-ing method required significantly fewer trials to deter-mine threshold [F(131) = 641 p lt 014] and that theupndashdown method required signif icantly more trials[F(131) = 606 p lt 017] than the average of the twomeasurements with the maximum-likelihood method

A repeated measures ANOVA on the maximum-like-lihood logistic thresholds showed no significant main ef-fect of method and the thresholds computed from thetrial-by-trial data from the ascending method of limitswere not significantly different from the thresholds com-puted from the data collected with the other methods[F(131) = 177 p = 19] This result is to be expectedsince presumably the performance of each observer isdriven by the same sensory process when tested by eachof the three psychometric methods

The mean confidence intervals for the thresholds (andstandard errors) are shown in the last column of Table 1The mean confidence interval of the maximum-likelihoodmethod is smaller than those of the other two Note thatthe standard error of the confidence interval reported forthe ML-PEST method is very small (002) This is be-cause the confidence interval was used as the stopping

criterion Trials were stopped when the confidence in-terval of a reached 08 A repeated measures ANOVAindicated that there was a signif icant main effect ofmethod [F(331) = 1727 p lt 0002] Contrasts tests re-vealed that the confidence intervals for the upndashdownmethod were significantly larger than those for both themaximum-likelihood method [F(131) = 5093 p lt0001] and the ascending method [F(131) = 2372 p lt0016]

DiscussionThe native absolute threshold values obtained with the

ascending method were higher than those obtained withthe other two methods One should remember that thesevalues are not conventional thresholds in the sense thatthey represent the concentration chosen correct five outof five times It is to be expected that this value will bea higher concentration than thresholds based on a lowerpercent correct Furthermore this method at least as de-scribed by Cain et al (1983) has considerably larger stepsbetween adjacent concentrations and only actual stimu-lus concentrations can be threshold values

The ascending method needed the fewest number oftrials (~13) relative to either the maximum-likelihoodmethod (~15) or the upndashdown staircase method (~17)The clear superiority of the maximum-likelihood methodhowever came with the high precision of the measuredthresholds as indicated by the small confidence inter-vals The confidence intervals of the upndashdown staircasewere almost three times larger than those given by themaximum-likelihood method

MONTE CARLO SIMULATIONS

When measuring the threshold of an actual subject itis not possible to know what the error is between themeasured value and the ldquotruerdquo value because the truevalue is not known In order to evaluate the accuracy ofthe three psychophysical methods we resorted to MonteCarlo evaluations of simulated observers having a pre-determined threshold value We used each of the threepsychophysical methods to measure the threshold ofeach simulated observer and compared the result withthe true value

Monte Carlo simulations were run for each of thethree psychophysical procedures using the same rulesand stimulus series as were used in the threshold mea-

Table 1Mean Performance (and Standard Error) of Three Psychophysical Methods

With Butyl Alcohol in Experiment 1

Native Number of ML Logistic Confidence Threshold Trials Threshold Interval

Method M SE M SE M SE M SE

ML-PESTLeft 341 016 1513 065 341 016 080 002Right 327 013 1541 076 327 013 084 002

WetherillndashLevitt 330 010 1744 072 330 011 232 034Ascending limits 310 013 1303 068 348 014 114 002

1338 LINSCHOTEN HARVEY ELLER AND JAFEK

surements for butyl alcohol with the real subjects re-ported above in Experiment 1 A population of 10000simulated subjects was defined whose thresholds fol-lowed a Gaussian distribution with a mean threshold of

300 log percent concentration and a standard devia-tion of 04 log units These values are representative ofthe distribution of real subjectsrsquo detection thresholds forbutyl alcohol as found in Experiment 2 below The pre-determined threshold for each of the 10000 simulatedobservers was drawn from the above distributionThreshold values higher than the highest stimulus con-centration and lower than the lowest stimulus concentra-tion were not used

Simulation 1On each trial the simulated subject was presented

with a stimulus Each subjectrsquos detection behavior wasdriven by a logistic psychometric function with a steep-ness of 35 and a chance level performance of 05 Theprobability of making the correct response on a trial wasdetermined by the probability predicted by the logisticfunction for the stimulus and for that subject and a uni-form random number between 0 and 1 If the randomnumber was less than the logistic function probability forthat stimulus the response was scored as correct other-wise it was scored as wrong The starting stimulus forthe upndashdown staircase and maximum-likelihood meth-ods was chosen randomly between the two stimuli thatbracketed the mean threshold of 30 log concentrationThese stimuli were 3113 and 2937 log concentra-tion The ascending method started with the lowest stim-ulus in its series 5130 log percent concentration

An ascending limits run was considered a failure if theprocedure called for a stimulus stronger than the higheststimulus concentration A WetherillndashLevitt staircase runwas considered a failure if during the trials a stimulusstronger than the highest stimulus concentration orweaker than the lowest stimulus concentration was calledfor In the present simulation the WetherillndashLevitt stair-

case failed 66 times All of these failures were caused bythe procedurersquos calling for a lower stimulus concentra-tion than the lowest available An ML-PEST run wouldhave been considered a failure if the final estimate of thethreshold were more than half the stopping confidenceinterval below the lowest candidate or above the highestcandidate a There were no such failures The failed runsare excluded from the data analyzed below

The distribution of errors for each method (estimatedthreshold minus true threshold) for the 10000 subjectsis shown in the left side of Figure 6 The distribution ofthe number of trials to reach stopping criterion for the10000 subjects is shown on the right side of Figure 6The maximum-likelihood method was more accuratethan the WetherillndashLevitt staircase and both of thesemethods were superior to the ascending method of lim-its The median error and the number of trials requiredfor each method are shown in Table 2 The median errorfor the maximum-likelihood method was almost half aslarge as that for the upndashdown staircase although both er-rors were quite small

The error on each of the 10000 subjects is plotted inFigure 7 as a function of the number of trials for that runThe range of errors became smaller as the number of tri-als increased for both the WetherillndashLevitt staircasemethod and the ML-PEST method and the median errorstayed close to zero The ascending staircase methodshowed quite different behavior The direction of theerror depended on the number of trials before the stop-ping criterion of five correct in a row had been reachedThe estimated thresholds were too low when the numberof trials was less than about 15 whereas the estimates ofthreshold were too high when the number of trials wasmore than 15

Simulation 2Because the WetherillndashLevitt staircase method and the

ML-PEST method generally required different numbers oftrials before the stopping criterion was reached on eachrun comparison of the accuracy is not straightforwardWe repeated the Monte Carlo simulations so that the stair-case method and the ML-PEST method used the samenumber of trials on each run First the WetherillndashLevittstaircase was run until the stopping criterion wasreached Then the ML-PEST method was run with thesame number of trials on that observer using the samestarting stimulus

The results are presented in Table 2 When the ML-PEST method had the same number of trials as theupndashdown staircase the ML-PEST method was slightlymore accurate As in Simulation 1 both of these meth-ods were superior in accuracy to the ascending staircaseThere were 60 failures out of the 10000 cases where theupndashdown staircase called for a stimulus that was lowerthan the lowest available concentration Because theML-PEST method was not permitted to run as many tri-als as it would normally require this method failed on 30out of the 10000 cases

Table 2Results of the Four Monte Carlo Simulations

Method Median Error Median Number of Trials

Simulation 1Maximum-likelihood 0027 16WetherillndashLevitt 0045 17Ascending limits 0331 14

Simulation 2Maximum-likelihood 0094 17WetherillndashLevitt 0101 17Ascending limits 0419 14

Simulation 3Maximum-likelihood 003 16WetherillndashLevitt 360 16Ascending limits 354 16

Simulation 4Maximum-likelihood 275 115WetherillndashLevitt 093 21Ascending limits 336 22

FAST AND ACCURATE THRESHOLDS 1339

Simulation 3An important ability of sensory testing is to detect

people who deliberately try to appear to have diminishedsensitivity We therefore repeated the Monte Carlo sim-ulations of 10000 subjects using the same starting con-ditions and the same test conditions as those in Simula-tion 1 but each simulated subject generated ldquountruthfulrdquoresponses (ie responses just the opposite of that calledfor by the subjectrsquos psychometric function) In the ML-

PEST computer program two likelihood functions aremaintained one based on the hypothesis that the subjectis responding truthfully and the other based on the hy-pothesis that the subject is deliberately choosing thewrong response The program explicitly tests the hy-pothesis that a subject is deliberately choosing the wrongresponse by comparing the maxima of the two likelihooddistributions and therefore should be able to measure a sensory threshold as effectively as when a subject is

Figure 6 Distribution of errors (left) and number of trials (right) for three psychophysical methods maximum-likelihoodadaptive staircase (upper panel) Wetherill and Levitt upndashdown staircase (middle panel) and ascending method of limits (lowerpanel) The vertical line in the left panels marks the zero-error bin of each histogram See text for details The abscissa scalefor the upper two left panels covers a range of 15 log units and is 5 log units for the lower left histogram

1340 LINSCHOTEN HARVEY ELLER AND JAFEK

Figure 7 Error as a function of number of trials for three psychophysicalmethods ML-PEST (upper panel) upndashdown staircase (middle panel) and as-cending method of limits (lower panel)

FAST AND ACCURATE THRESHOLDS 1341

telling the truth The other two methods do not have anexplicit way of dealing with this situation It is of inter-est to learn how all three methods behave under thesecircumstances

The results of this simulation are presented in Table 2The ML-PEST method successfully measured the thresh-old and explicitly detected the lying condition in all10000 cases and was therefore able to measure thethreshold of the underlying psychometric function witha small median error The upndashdown staircase method andthe ascending method of limits reached the highest con-centration and tried to go higher resulting in 9824 fail-ures for the staircase method and 8648 failures for themethod of limits If we consider a failure as a successfuldetection of the lying behavior the upndashdown staircasemethod detected 98 of the cases and the ascendingmethod of limits detected 86 of the cases These twomethods required about the same number of trials (seeTable 2)

Simulation 4It is important in clinical testing that people with ele-

vated thresholds be easily detected One of the subjects inExperiment 1 f irst had smell tested by the upndashdownmethod and exhibited a normal threshold Later the samenostril was tested using the maximum-likelihood methodand no threshold could be measured because as it turnedout he was anosmic on that side The fact that the upndashdown method did not detect the anosmia is of concern

We therefore repeated the Monte Carlo simulations of10000 subjects using the same starting conditions andthe same test conditions as those in Simulation 1 withone change Each simulated subject generated a randomresponse as would be the case if a person had no sensorysensitivity The ideal behavior of a testing method wouldbe to indicate that the threshold was at the highest valuethat is possible with that method or to have some otherexplicit way of indicating that the subject is respondingrandomly

None of the three methods performed very well in de-tecting the random responding The median errors andnumber of trials are given in Table 2 The upndashdown stair-case method did the poorest because it gives a medianerror of 93 This rather low error occurs because on87 of the cases the stopping criterion of 5 reversalswas satisfied before the method reached the upper limitof the stimulus concentrations The ML-PEST methodfailed on 11 of the cases but overall the thresholdswere estimated to be very high Unfortunately it re-quired a great number of trials to achieve this perfor-mance The ascending method of limits reached theupper limit of stimulus concentration on 66 of thecases and also gave the highest values for the threshold

Since in Monte Carlo simulations the distribution ofthe actual thresholds is known we can apply an SDT analy-sis (Green amp Swets 19661974 Macmillan amp Creelman1991) to compare the distribution of estimated thresh-olds made by each method under random respondingwith the true distribution Using the means and standard

deviations of the above distributions we computed thesignal detection measures of sensitivity da and accuracythe area under the ROC curve Az (Harvey 1992 Simp-son amp Fitter 1973) The ML-PEST method had the high-est sensitivity and accuracy (da = 227 Az = 95) The as-cending method was next best (da = 199 Az = 92) andthe upndashdown staircase method was the least sensitive(da = 150 Az = 86)

DiscussionThe results of the simulations lead to the conclusion that

both the upndashdown staircase method and the maximum-likelihood method have considerable strengths for use inmeasuring taste and smell thresholds The ascendingmethod of limits should not be used if one is interestedin obtaining an accurate estimate of the threshold Thesethreshold estimates covary with the number of trialstaken to stop as is illustrated in the lower panel of Fig-ure 7 The simulation results offer an explanation for thepublished finding that ascending limit measurement ofolfactory thresholds are less reliable than staircase mea-sured thresholds (Doty McKeown Lee amp Shaman1995)

The upndashdown staircase method gives an average errortwice as large as that of the ML-PEST method althoughboth values are small and therefore similar to eachother The upndashdown staircase method failed by callingfor a stimulus too low on 66 out of 10000 cases whereasthe ML-PEST method had no such failures Althoughthe median number of trials gives the edge to the ML-PEST method on some occasions the ML-PEST methodrequired over 50 trials to reach its stopping criterion

A strength of the ML-PEST method is its ability toconsider alternate hypotheses about the response strat-egy of the observer The ML-PEST method was able tocorrectly detect all 10000 dishonest responders We arecurrently preparing a paper that will go into considerabledetail on how to distinguish malingerers from true anos-mics using a variety of methods Another strength is thatthere is virtually no bias introduced into threshold mea-surements by having the ML-PEST steepness differentfrom the ldquotruerdquo value set in the Monte Carlo observer(Treutwein amp Strasburger 1999) Data generated from aMonte Carlo Weibull function observer can be well fit byvirtually any s-shaped psychometric function althoughthe Weibull will usually fit slightly better We thereforerecommend the ML-PEST method for testing taste andsmell

EXPERIMENT 2TestndashRetest Reliability

In order to assess the stability of the measures ob-tained with the adaptive maximum-likelihood procedurewe tested the subjects on four occasions

MethodSubjects Twenty-seven nonsmoking healthy subjects partici-

pated in the experiment (mean age = 316 plusmn 87 years range =

1342 LINSCHOTEN HARVEY ELLER AND JAFEK

22ndash57 years) The 14 women and 13 men were divided into twogroups Group 1 (ldquoconsecutive rdquo 7 females and 7 males) was testedon 4 consecutive days and Group 2 (ldquodistributed rdquo 7 females and 6males) was tested on Days 1 2 8 and 22

Stimuli Smell thresholds for butyl alcohol were assessed usingthe same range of concentrations as were used with the maximum-likelihood method in Experiment 1 Taste thresholds were measuredusing 20 concentrations of NaCl in quarter-log steps The highestconcentration was 17 M The same maximum-likelihood adaptivemethod as described in Experiment 1 was used in this experiment

Procedure Thresholds were determined with a temporal 2AFCprocedure Taste sensitivity was measured with a whole-mouth sip-and-spit procedure The stimuli and the blanks were presented in30-ml plastic medicine cups containing 5 ml of liquid The sub-jects were requested to take the f irst sample in their mouths swishit around and spit it out then rinse with distilled water and do thesame with the second sample They were asked to choose the sam-ple that had a taste different from water In each of the four sessionstaste thresholds were determined first and smell thresholds last Allfour smell thresholds were determined in the preferred nostril Thesubjects were tested at the same time of each day

ResultsBecause the measurement and specification of testndash

retest reliability is a rather complex topic (Linn 1989)

we have chosen four different ways to demonstrate thereliability of the maximum-likelihood method For thefirst method the four threshold measurements for eachsubject are plotted in Figure 8 The thick line in eachpanel is the mean threshold It is clear from examinationof Figure 8 that most individual thresholds were stableover 4 days (left column of panels) and over 22 days(right column of panels) with a few subjects showingshifts of more than a 1 log unit The mean thresholds forthe two groups are shown in Table 3 A repeated mea-sures ANOVA confirmed the stability of the thresholdsThe effect of testing day was not significant

A second approach is to compute the correlationsamong the four threshold measurements (see eg Mattes1988) Pearson correlation coefficients between thresh-olds obtained on different days are shown in Table 4 Thecorrelations were generally higher for pairs of thresholdsthat were measured close together in time As Mattespointed out correlations are highly influenced by smallchanges in ordinal ranking of the subjects The fact thatthe pairwise correlations became smaller as time sepa-ration between them increased is shown in Figure 9 The

Figure 8 Log threshold concentration as a function of test day for NaCl (upper row) and butanol (lower row) measured on 4 suc-cessive days (left column) and over 22 days (right column) The thin lines connect the four thresholds of individual subjects The thickline is the mean threshold concentration of all subjects

FAST AND ACCURATE THRESHOLDS 1343

solid line is the least squares regression line It indicatesthat the correlation between two thresholds diminishedas the time between them increased The slope of the re-gression line becomes even steeper if the data for 1-dayseparation are removed from the regression indicatingthat the decrease in correlation started with the 2-dayseparation condition

The third method to assess testndashretest reliability isbased on the differences between successive thresholdmeasurements Each threshold was subtracted from theprevious threshold The four threshold measurements re-sulted in three difference measurements The distribu-tions of these differences for NaCl and for butyl alcoholare plotted on the left side of Figure 10 If the measuredthresholds do not systematically increase or decreaseover time we expect that the mean difference would bezero The mean of the distribution for NaCl (upper panelof Figure 10) was 0018 and for butyl alcohol (lowerpanel) was 0002 Neither of these was significantly dif-ferent from zero The standard deviations of these distri-butions were 0413 log units for NaCl and 0422 log unitsfor butyl alcohol

A fourth way to examine testndashretest reliability is bymeans of the spread of the four threshold measures foreach subject A standard deviation of 00 would meanthat all four thresholds were exactly the same value Thestandard deviation of the four measurements was com-puted for each subject The distribution of these standarddeviations for NaCl (upper panel) and butyl alcohol(lower panel) are plotted on the right side of Figure 10The median standard deviation was about 015 log unitsfor both NaCl and butyl alcohol This good testndashretestreliability is consistent with the other three methods ofassessing reliability reported above

DiscussionThe maximum-likelihood method gives estimates of

thresholds that are stable within individuals over timeand that are responsive to individual differences By allfour indicators the testndashretest reliability of both NaCland butyl alcohol thresholds measured by the maximum-likelihood method was very good

GENERAL DISCUSSION

The two experiments and the Monte Carlo simulationsreported here demonstrate that the maximum-likelihoodmethod is a viable alternative to other methods currentlyin use to measure taste and smell detection thresholdsBecause the method attempts to use stimuli that are op-timal (close to the predicted threshold) the confidenceintervals of the measured thresholds are considerablysmaller than those achieved by the other two methods(Experiment 1)

One of the strengths of the maximum-likelihoodmethod is that it provides an explicit way of estimatingthe confidence interval of the threshold a facility notprovided by the other two methods The experimenter isfree to establish a stopping confidence interval to suithisher needs If greater precision is desired it can beachieved at the expense of running more trials This ex-plicit tradeoff between precision and number of trials isnot possible with the other two psychophysical methodsOne could of course use a hybrid combination of meth-ods an upndashdown staircase method for generating the nextstimulus to use combined with a maximum-likelihoodmethod for estimating the threshold and its confidenceinterval In effect we used this approach after the datafrom Experiment 1 were collected by reanalyzing them

Table 3Mean NaCl and Butyl Alcohol Thresholds (and Standard Errors) From Experiment 2

Day 1 Day 2 Day 3 Day 4

M SE M SE M SE M SE

NaClConsecutive (Group 1) 262 013 267 011 282 013 263 011Distributed (Group 2) 290 019 283 009 283 011 277 010

Butyl AlcoholConsecutive (Group 1) 319 014 317 015 325 012 310 013Distributed (Group 2) 300 016 310 014 309 010 309 011

Table 4Pearson Correlation Coefficients Between Thresholds

Paired Sessions

1ndash2 1ndash3 1ndash4 2ndash3 2ndash4 3 ndash 4

NaClConsecutive (Group 1) 46 66 66 44 75 66Distributed (Group 2) 58 72 40 55 20 51Butyl alcoholConsecutive (Group 1) 86 83 86 67 71 76Distributed (Group 2) 26 68 56 51 21 65

p lt 05 p lt 01 p lt 001

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 3: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

1332 LINSCHOTEN HARVEY ELLER AND JAFEK

Olmstead Iwamoto-Schaapp amp Jarvik 1998) and theone-upndashtwo-down variant of the Wetherill and Levittstaircase method (Wetherill amp Levitt 1965) for taste(Anliker Bartoshuk Ferris amp Hooks 1991 Cowart1989 Cowart Yokomukai amp Beauchamp 1994 Drew-nowski Henderson amp Shore 1997 Grushka Sessle ampHowley 1986 Murphy amp Cain 1986 Murphy et al1994 Stevens 1995 Weiffenbach Baum amp Burghauser1982 Weiffenbach Schwartz Atkinson amp Fox 1995)The method of constant stimuli could be used to measurea complete psychometric function like the one shown inFigure 1 using a fixed number of stimuli presented nu-merous times in random order Even though the shape ofthe function can be highly informative this method isonly rarely used in taste and smell research most likelybecause of the large number of experimental trials re-quired for stable estimates (Linschoten amp Kroeze 19911992 1994) A few experimenters have used an SDT par-adigm (yesndashno or rating scale judgments) to measure bothsensory sensitivity and response bias (Doty Snyder Hug-gins amp Lowry 1981 OrsquoMahony et al 1979)

In audition and vision adaptive psychophysical meth-ods are widely employed and the theoretical bases forthem are well understood To minimize the number of

trials the most efficient testing strategy is to compute anestimate of the threshold after each psychophysical trialand use a stimulus close to the threshold on the next trial(Taylor amp Creelman 1967) The estimation of the thresh-old is done by fitting a psychometric function to the datacollected up to that point in the experiment and estimat-ing the value of the threshold from the best-fitting func-tion This method is formally called a maximum-likelihoodadaptive staircase method and is known by several dif-ferent names Best PEST (Pentland 1980) QUEST (Wat-son amp Pelli 1983) and ML-PEST (Harvey 1986 1997)We will refer to it as the ML-PEST method (maximum-likelihood parameter estimation by sequential testing)The term PEST was originally coined by Taylor andCreelman (1967)

That good estimates of the threshold can be obtainedfrom sparse data is shown in Figure 2 where psychome-tric functions for the same 20 stimuli as in Figure 1 butwith 1000 100 10 and 1 presentations per stimulus areshown As the number of trials per stimulus intensity de-creases the empirical psychometric function becomesmore and more irregular The solid curve in each figureis the best-fitting logistic function (best value of a wasfound holding b and g constant) obtained with a maximum-

Figure 2 Typical S-shaped psychometric function for stimulus detection in a two-alternative forced-choice paradigm The datawere generated by a Monte Carlo logistic observer having an a of 10 stimulus intensity units a b of 35 and a g of 05 Each of the 20stimulus intensities was presented 1000 100 10 and 1 times The solid line is the best-fitting logistic function fit to the data using amaximum-likelihood curve-fitting technique (Harvey 1986 1997)

FAST AND ACCURATE THRESHOLDS 1333

likelihood fitting technique (Harvey 1986 1997) Evenin the extreme case of 1 presentation per stimulus thebest-fitting logistic function gives a good estimate of thethreshold

Our goal in the present paper is to introduce the use ofthe maximum-likelihood adaptive staircase procedurefor measuring sensitivity in the chemical senses We willcompare the efficiency and the accuracy of this methodwith those of other methods and present testndashretest dataas an assessment of reliability

MAXIMUM-LIKELIHOOD ADAPTIVESTAIRCASE PSYCHOPHYSICAL METHOD

The principles of this method are described in a vari-ety of sources (Hall 1968 1981 Harvey 1986 1997Hays 1963 Watson amp Pelli 1983) which provide moredetail than is presented here For this discussion it is as-sumed that the psychometric function is logistic and thatdata are collected with a 2AFC experimental paradigmalthough other functions and paradigms could certainlybe used During psychophysical testing there are threemajor questions that are answered by this method aftereach trial (1) What is the current estimate of the thresh-old (2) What stimulus should be presented to the sub-ject on the next trial (3) Should the psychophysical tri-als be stopped Although the answers are somewhatinterrelated it is useful to consider them separately

Estimation of the ThresholdDuring psychophysical testing the raw data collected

on each trial are the stimulus concentration presented tothe subject and whether or not the subject made a correctresponse In order to estimate the threshold a series ofpossible psychometric functions is considered A likeli-hood function is constructed by computing the log like-lihood for each of these possible psychometric functionseach one having the same value of b (steepness) and g(the probability of being correct by chance) and eachhaving a different value of a The computational proce-dure is described by Harvey (1986 1997) The smallestcandidate a is chosen to lie below any possible realthreshold or at least lower than the lowest physical stim-ulus Successive values increase in 001-log-unit stepsup to an appropriate maximum In testing the detectionof NaCl for example we consider 451 as ranging from

450 log molar concentration (log M) to 000 log molarconcentration in 001-log steps Each candidate logisticfunction has a fixed b (35) and a fixed g (05) for the2AFC psychophysical paradigm The b value is not crit-ical for converging on the correct value of b The valueof 35 is close to actual psychometric functions for NaClmeasured with the method of constant stimuli The like-lihood functions after having run 0 5 10 15 20 and 25trials on a Monte Carlo observer with a true value of aof 2125 log M (00075 M) are plotted in the upperpanel of Figure 3 The best estimate of the threshold is

that value of a having the maximum likelihood (ie themean of the likelihood distribution) In this simulationthe best estimate of a was 2110 log M after 25 trialsan error of 0015 log M

Choosing the Next StimulusAt the beginning of an experiment before any data have

been collected all candidate as have equal likelihood ofbeing the threshold But often the experimenter has someprior idea of what the threshold value should be espe-cially if the subject has normal sensory function We in-corporate these prior expectations into the method bykeeping track of a second likelihood function that startswith the prior likelihoods and adds the results of each trialto it We use a Gaussian distribution for the prior likeli-hood function with a standard deviation of 040 which isthe value we found in our testndashretest experiment to be re-ported below (Experiment 2) In the example the initialestimate of the threshold was 1875 log M These pos-terior likelihood functions are plotted in the lower panel ofFigure 3 after having run 0 5 10 and 15 trials on the sameobserver The mode of this posterior likelihood functionafter each trial is used to choose the next stimulus

In taste and smell it is usually not feasible to maintainmore than 20 or so different stimulus concentrations forpresentation to an observer The stimulus recommendedby the maximum of the posterior likelihood functiontherefore may not actually be available to the experi-menter to use on the next trial There will be an actualstimulus that is lower than the recommendation and an-other stimulus that is higher than the recommendationTo choose between these two we compute relative dis-tance of the recommended stimulus between the two ac-tual stimuli and we compute the relative frequency ofeach of the two actual stimuli We choose one randomlywith a probability that favors the stimulus closer to therecommendation and the stimulus with the fewer numberof trials This computation is done in CStim-gtGetBest-StimulusIndex() in the ML-PEST software packageKeeping the list of candidate as separate from the list ofavailable stimuli means that the threshold a can be com-puted to a precision that is not limited by the small num-ber of actual stimuli used in the experiment

For NaCl detection we use 18 different concentrationsranging from 400 log M to 025 log M in 025-log-unitsteps The actual stimulus concentrations that were pre-sented to the Monte Carlo observer are plotted in Fig-ure 4 for each of the 25 trials The solid circles representtrials on which the observer was correct the open circlesare incorrect trials Because the observer was correct onthe first three trials the stimuli that were presented weresuccessively weaker concentrations On Trial 4 an errorwas made and the stimulus on the following trial wasthe same concentration Generally but not always (be-cause of the random process of choosing the next stimu-lus) the next stimulus will be weaker after a correct re-sponse and stronger after an incorrect response The

1334 LINSCHOTEN HARVEY ELLER AND JAFEK

maximum-likelihood estimate of the threshold is plottedby the solid line in Figure 4 Until both correct and in-correct trials are collected the estimate of the thresholdwill be at either the lower or the upper end of the rangeof candidate as In this example the observer was cor-rect on the first three trials and then made a mistake onTrial 4 The estimate of the threshold was extremely low(not visible in the figure) until Trial 4 when it jumped upto the region near the true value As the trials continuethe estimated threshold fluctuates around the true valueand eventually converges on it After 25 trials the esti-mated threshold was 211 log M an error of 0015

log units This low error is achieved even though no ac-tual stimulus corresponding to the threshold was avail-able for testing

When to StopTwo stopping criteria can be used to terminate psy-

chophysical testing stop after a fixed number of trials orstop after the estimation of the threshold reaches a re-quired level of precision Precision is measured by theconfidence interval which is related to the width of thelikelihood function Notice in Figure 3 that the likeli-hood function becomes narrower as the number of trials

Figure 3 Log likelihood (upper panel) and log posterior likelihood (lower panel) as a function ofcandidate a following 0 5 10 15 20 and 25 Monte Carlo trials with a logistic observer having atrue a of 00075 M ( 2125 log M) detecting NaCl in a two-alternative forced-choice detectionparadigm

FAST AND ACCURATE THRESHOLDS 1335

increases The confidence interval of the estimatedthreshold is directly related to the width of the likelihooddistribution and is calculated using the likelihood ratiotest and the c2 statistical distribution with 1 degree offreedom (Hays 1963)

(2)

where Lx is the likelihood of a model whose a parameteris x and Lmax is the likelihood of the best-fitting modelWe search for the stimulus concentrations (x) above andbelow the threshold stimulus (the maximum-likelihooda) at which the log likelihood drops to c22 below themaximum log likelihood value To find the 95 confi-dence interval the criterion value of c2 is 50239 (thisvalue is obtained from standard tables of the c2 distribu-tion found in all statistic text books The one-tailed valueof 50239 corresponds to a probability of 025 since theconfidence interval is two-tailed 50239 corresponds toa probability of 05)

In Figure 5 the same data shown in Figure 4 are plot-ted with an expanded vertical scale to show the estimateof the threshold along with the estimated 95 confi-dence interval Notice that as the number of trials in-creases the conf idence interval of the threshold de-creases After 25 trials the 95 confidence interval ofthe threshold extends 027 log units below the estimateof the threshold (indicated by the solid horizontal linenot by the filled circle which represents the stimulus

presented on that trial) and 037 log units above it for atotal confidence interval of 064 log units

EXPERIMENT 1Comparison of Three Psychophysical Methods

The goal of Experiment 1 was to compare the ML-PEST method with two commonly used methods withrespect to their efficiency and their precision Detectionthresholds for butyl alcohol were measured using threemethods the ascending method of limits as described by Cain et al (1983) the upndashdown staircase method(Wetherill amp Levitt 1965) and the ML-PEST maximum-likelihood staircase method as described above (Harvey1986 1997)

MethodSubjects Thirty-two (16 males and 16 females) healthy non-

smoking subjects participated in the experiment Their mean age was304 years (plusmn 81) ranging from 17 to 60 years None reported hav-ing conditions that are normally associated with smell dysfunction

Stimuli for ascending method of limits For the ascendingmethod of limits the procedure described by Cain et al (1983) wasused Thirteen concentrations of butyl alcohol were prepared withdistilled water as diluent The strongest concentration was 4 Ad-jacent steps differed by a factor of 3

Stimuli for staircase methods The Wetherill and Levitt andML-PEST methods used a series of 24 concentrations of butyl al-cohol The lowest 20 concentrations were a factor of 15 apart withthe strongest 01 This series was augmented by the four strongestof the series for the ascending method which were separated by afactor of 3

Procedure Thresholds were determined with a temporal 2AFCprocedure In each trial a stimulus was paired with a blank and the

c 2 2 0 2 0= times = timesaeligegraveccedil

oumloslashdivide

log ( ) log max

e exL

Llikelihood ratio

Figure 4 Stimulus presentation as a function of trial number in the two-alternative forced-choice Monte Carlo simulation Trials on which the subjectrsquos response was correct are repre-sented by the filled circles trials on which the subjectrsquos response was incorrect are representedby the open circles The resulting maximum-likelihood estimate of a is plotted by the solid lineThe horizontal dotted line marks the true value of a

1336 LINSCHOTEN HARVEY ELLER AND JAFEK

subjects were asked to select the bottle containing the stimulus Thesubjects were asked to give their best guess even when they werecertain that the two stimuli were identical The stimulus and theblank were presented in 250-ml squeezable polyethylene bottlescontaining 60 ml of liquid The subjects were instructed to keep onenostril closed while the other nostril was tested with three puffs outof each bottle For each subject four thresholds were obtained twofor each nostril In randomized order and alternating nostrils onethreshold was determined with the ascending method one with theupndashdown method and two with the adaptive maximum-likelihoo dmethod Testing took about 45 min

The definition of detection threshold differs in several aspectsamong the three methods In the ascending method testing startedwith the lowest concentration A correct response led to the pre-sentation of the same concentration on the next trial When an in-correct response was given the next higher concentration was usedTesting stopped when five correct answers had been given in a rowThe threshold was defined as the last-used concentration (Cain et al1983) and thus corresponded to a detection probability of 10

In the upndashdown staircase method testing could start at any stepin the concentration series The first subject was started in the mid-dle of the range and each subsequent subject started where the for-mer subject had ended After one correct response the same con-centration was presented if two correct responses were given thenext lower concentration was presented on the next trial One in-correct response led to the presentation of a concentration one stephigher This decision rule is called the one-upndashtwo-down rule Al-though this decision rule is widely used the actual threshold esti-mate depends more on the size of the stimulus step than on the par-ticular upndashdown rule (Garcia-Perez 1998) Testing stopped afterfive reversals and the threshold was computed as the average of the

stimulus concentrations at the last four reversals This thresholdcorresponded to a detection probability of approximately 75

In the maximum-likelihood adaptive staircase method testingstarted in the middle of the stimulus range After each response theestimate of the threshold was calculated and the next stimulus wasa concentration close to that threshold Testing stopped when theconfidence interval of the estimated threshold reached a predefinedcriterion (08 log concentration units) This threshold corresponde dto a detection probability of exactly 75 All these computationswere carried out in real time on a Power Macintosh computer usinga program SensoryTester written by author LOH

Data analysis The mean threshold (and standard error) pro-duced by each method is presented in Table 1 labeled ldquoNativeThresholdrdquo As noted above each method produces a threshold es-timate that corresponds to a different performance level of detec-tion To obtain threshold and confidence interval estimates for theascending method of limits and the upndashdown staircase method thatcould be directly compared with those obtained by the ML-PESTmethod a logistic psychometric function was fitted to the actual se-quence of stimulus presentations and responses from these twomethods The values of b and g were held constant at the same valuesused in the ML-PEST method The maximum-likelihood thresholdand the confidence interval were then computed in exactly the samemanner as with the ML-PEST method This f itting procedure willgive a lower threshold for the ascending method of limits data thanthat reported by the method of limits procedure itself because theascending method of limits looks for a threshold stimulus giving100 correct whereas the maximum-likelihood threshold corre-sponds to 75 correct A repeated measures analysis of variance(ANOVA) was computed using StatView 501 (SAS Institute 1998a1998b) on each of four dependent variables native log threshold

Figure 5 Stimulus presentation as a function of trial number in the two-alternative forced-choiceMonte Carlo simulation Trials on which the subjectrsquos response was correct are represented by thefilled circles trials on which the subjectrsquos response was incorrect are represented by the open cir-cles The resulting maximum-likelihood estimate of a is plotted by the solid line The horizontaldotted line marks the true value of a The vertical bars show the 95 confidence interval of eachestimate of a

FAST AND ACCURATE THRESHOLDS 1337

number of trials to reach the native threshold maximum-likelihoo dlogistic threshold and confidence interval of the maximum-likelihoo dthreshold

ResultsThe native threshold values the mean numbers of tri-

als needed to reach the stopping criterion the maximum-likelihood logistic thresholds and the mean confidenceintervals of the maximum-likelihood thresholds are sum-marized in Table 1 A repeated measures ANOVA on thenative thresholds showed no significant main effect ofmethod [F(331) = 161 p lt 187] Subsequent statisticalcontrast comparisons using the standard HuynhndashFeldt eto correct the p value and the degrees of freedom re-vealed that the native threshold of the ascending methodof limits was just signif icantly higher than the othermethods [F(131) = 392 p = 051] These higher thresh-olds are to be expected given that the ascending methodof limits computes the threshold to be the concentrationcorresponding to a detection probability of 10 whereasthe other methods use a performance criterion of about 75

A repeated measures ANOVA on the number of trialsneeded to reach the stopping criterion showed a signifi-cant main effect of method [F(331) = 626 p lt 0008]Statistical contrast comparisons showed that the ascend-ing method required significantly fewer trials to deter-mine threshold [F(131) = 641 p lt 014] and that theupndashdown method required signif icantly more trials[F(131) = 606 p lt 017] than the average of the twomeasurements with the maximum-likelihood method

A repeated measures ANOVA on the maximum-like-lihood logistic thresholds showed no significant main ef-fect of method and the thresholds computed from thetrial-by-trial data from the ascending method of limitswere not significantly different from the thresholds com-puted from the data collected with the other methods[F(131) = 177 p = 19] This result is to be expectedsince presumably the performance of each observer isdriven by the same sensory process when tested by eachof the three psychometric methods

The mean confidence intervals for the thresholds (andstandard errors) are shown in the last column of Table 1The mean confidence interval of the maximum-likelihoodmethod is smaller than those of the other two Note thatthe standard error of the confidence interval reported forthe ML-PEST method is very small (002) This is be-cause the confidence interval was used as the stopping

criterion Trials were stopped when the confidence in-terval of a reached 08 A repeated measures ANOVAindicated that there was a signif icant main effect ofmethod [F(331) = 1727 p lt 0002] Contrasts tests re-vealed that the confidence intervals for the upndashdownmethod were significantly larger than those for both themaximum-likelihood method [F(131) = 5093 p lt0001] and the ascending method [F(131) = 2372 p lt0016]

DiscussionThe native absolute threshold values obtained with the

ascending method were higher than those obtained withthe other two methods One should remember that thesevalues are not conventional thresholds in the sense thatthey represent the concentration chosen correct five outof five times It is to be expected that this value will bea higher concentration than thresholds based on a lowerpercent correct Furthermore this method at least as de-scribed by Cain et al (1983) has considerably larger stepsbetween adjacent concentrations and only actual stimu-lus concentrations can be threshold values

The ascending method needed the fewest number oftrials (~13) relative to either the maximum-likelihoodmethod (~15) or the upndashdown staircase method (~17)The clear superiority of the maximum-likelihood methodhowever came with the high precision of the measuredthresholds as indicated by the small confidence inter-vals The confidence intervals of the upndashdown staircasewere almost three times larger than those given by themaximum-likelihood method

MONTE CARLO SIMULATIONS

When measuring the threshold of an actual subject itis not possible to know what the error is between themeasured value and the ldquotruerdquo value because the truevalue is not known In order to evaluate the accuracy ofthe three psychophysical methods we resorted to MonteCarlo evaluations of simulated observers having a pre-determined threshold value We used each of the threepsychophysical methods to measure the threshold ofeach simulated observer and compared the result withthe true value

Monte Carlo simulations were run for each of thethree psychophysical procedures using the same rulesand stimulus series as were used in the threshold mea-

Table 1Mean Performance (and Standard Error) of Three Psychophysical Methods

With Butyl Alcohol in Experiment 1

Native Number of ML Logistic Confidence Threshold Trials Threshold Interval

Method M SE M SE M SE M SE

ML-PESTLeft 341 016 1513 065 341 016 080 002Right 327 013 1541 076 327 013 084 002

WetherillndashLevitt 330 010 1744 072 330 011 232 034Ascending limits 310 013 1303 068 348 014 114 002

1338 LINSCHOTEN HARVEY ELLER AND JAFEK

surements for butyl alcohol with the real subjects re-ported above in Experiment 1 A population of 10000simulated subjects was defined whose thresholds fol-lowed a Gaussian distribution with a mean threshold of

300 log percent concentration and a standard devia-tion of 04 log units These values are representative ofthe distribution of real subjectsrsquo detection thresholds forbutyl alcohol as found in Experiment 2 below The pre-determined threshold for each of the 10000 simulatedobservers was drawn from the above distributionThreshold values higher than the highest stimulus con-centration and lower than the lowest stimulus concentra-tion were not used

Simulation 1On each trial the simulated subject was presented

with a stimulus Each subjectrsquos detection behavior wasdriven by a logistic psychometric function with a steep-ness of 35 and a chance level performance of 05 Theprobability of making the correct response on a trial wasdetermined by the probability predicted by the logisticfunction for the stimulus and for that subject and a uni-form random number between 0 and 1 If the randomnumber was less than the logistic function probability forthat stimulus the response was scored as correct other-wise it was scored as wrong The starting stimulus forthe upndashdown staircase and maximum-likelihood meth-ods was chosen randomly between the two stimuli thatbracketed the mean threshold of 30 log concentrationThese stimuli were 3113 and 2937 log concentra-tion The ascending method started with the lowest stim-ulus in its series 5130 log percent concentration

An ascending limits run was considered a failure if theprocedure called for a stimulus stronger than the higheststimulus concentration A WetherillndashLevitt staircase runwas considered a failure if during the trials a stimulusstronger than the highest stimulus concentration orweaker than the lowest stimulus concentration was calledfor In the present simulation the WetherillndashLevitt stair-

case failed 66 times All of these failures were caused bythe procedurersquos calling for a lower stimulus concentra-tion than the lowest available An ML-PEST run wouldhave been considered a failure if the final estimate of thethreshold were more than half the stopping confidenceinterval below the lowest candidate or above the highestcandidate a There were no such failures The failed runsare excluded from the data analyzed below

The distribution of errors for each method (estimatedthreshold minus true threshold) for the 10000 subjectsis shown in the left side of Figure 6 The distribution ofthe number of trials to reach stopping criterion for the10000 subjects is shown on the right side of Figure 6The maximum-likelihood method was more accuratethan the WetherillndashLevitt staircase and both of thesemethods were superior to the ascending method of lim-its The median error and the number of trials requiredfor each method are shown in Table 2 The median errorfor the maximum-likelihood method was almost half aslarge as that for the upndashdown staircase although both er-rors were quite small

The error on each of the 10000 subjects is plotted inFigure 7 as a function of the number of trials for that runThe range of errors became smaller as the number of tri-als increased for both the WetherillndashLevitt staircasemethod and the ML-PEST method and the median errorstayed close to zero The ascending staircase methodshowed quite different behavior The direction of theerror depended on the number of trials before the stop-ping criterion of five correct in a row had been reachedThe estimated thresholds were too low when the numberof trials was less than about 15 whereas the estimates ofthreshold were too high when the number of trials wasmore than 15

Simulation 2Because the WetherillndashLevitt staircase method and the

ML-PEST method generally required different numbers oftrials before the stopping criterion was reached on eachrun comparison of the accuracy is not straightforwardWe repeated the Monte Carlo simulations so that the stair-case method and the ML-PEST method used the samenumber of trials on each run First the WetherillndashLevittstaircase was run until the stopping criterion wasreached Then the ML-PEST method was run with thesame number of trials on that observer using the samestarting stimulus

The results are presented in Table 2 When the ML-PEST method had the same number of trials as theupndashdown staircase the ML-PEST method was slightlymore accurate As in Simulation 1 both of these meth-ods were superior in accuracy to the ascending staircaseThere were 60 failures out of the 10000 cases where theupndashdown staircase called for a stimulus that was lowerthan the lowest available concentration Because theML-PEST method was not permitted to run as many tri-als as it would normally require this method failed on 30out of the 10000 cases

Table 2Results of the Four Monte Carlo Simulations

Method Median Error Median Number of Trials

Simulation 1Maximum-likelihood 0027 16WetherillndashLevitt 0045 17Ascending limits 0331 14

Simulation 2Maximum-likelihood 0094 17WetherillndashLevitt 0101 17Ascending limits 0419 14

Simulation 3Maximum-likelihood 003 16WetherillndashLevitt 360 16Ascending limits 354 16

Simulation 4Maximum-likelihood 275 115WetherillndashLevitt 093 21Ascending limits 336 22

FAST AND ACCURATE THRESHOLDS 1339

Simulation 3An important ability of sensory testing is to detect

people who deliberately try to appear to have diminishedsensitivity We therefore repeated the Monte Carlo sim-ulations of 10000 subjects using the same starting con-ditions and the same test conditions as those in Simula-tion 1 but each simulated subject generated ldquountruthfulrdquoresponses (ie responses just the opposite of that calledfor by the subjectrsquos psychometric function) In the ML-

PEST computer program two likelihood functions aremaintained one based on the hypothesis that the subjectis responding truthfully and the other based on the hy-pothesis that the subject is deliberately choosing thewrong response The program explicitly tests the hy-pothesis that a subject is deliberately choosing the wrongresponse by comparing the maxima of the two likelihooddistributions and therefore should be able to measure a sensory threshold as effectively as when a subject is

Figure 6 Distribution of errors (left) and number of trials (right) for three psychophysical methods maximum-likelihoodadaptive staircase (upper panel) Wetherill and Levitt upndashdown staircase (middle panel) and ascending method of limits (lowerpanel) The vertical line in the left panels marks the zero-error bin of each histogram See text for details The abscissa scalefor the upper two left panels covers a range of 15 log units and is 5 log units for the lower left histogram

1340 LINSCHOTEN HARVEY ELLER AND JAFEK

Figure 7 Error as a function of number of trials for three psychophysicalmethods ML-PEST (upper panel) upndashdown staircase (middle panel) and as-cending method of limits (lower panel)

FAST AND ACCURATE THRESHOLDS 1341

telling the truth The other two methods do not have anexplicit way of dealing with this situation It is of inter-est to learn how all three methods behave under thesecircumstances

The results of this simulation are presented in Table 2The ML-PEST method successfully measured the thresh-old and explicitly detected the lying condition in all10000 cases and was therefore able to measure thethreshold of the underlying psychometric function witha small median error The upndashdown staircase method andthe ascending method of limits reached the highest con-centration and tried to go higher resulting in 9824 fail-ures for the staircase method and 8648 failures for themethod of limits If we consider a failure as a successfuldetection of the lying behavior the upndashdown staircasemethod detected 98 of the cases and the ascendingmethod of limits detected 86 of the cases These twomethods required about the same number of trials (seeTable 2)

Simulation 4It is important in clinical testing that people with ele-

vated thresholds be easily detected One of the subjects inExperiment 1 f irst had smell tested by the upndashdownmethod and exhibited a normal threshold Later the samenostril was tested using the maximum-likelihood methodand no threshold could be measured because as it turnedout he was anosmic on that side The fact that the upndashdown method did not detect the anosmia is of concern

We therefore repeated the Monte Carlo simulations of10000 subjects using the same starting conditions andthe same test conditions as those in Simulation 1 withone change Each simulated subject generated a randomresponse as would be the case if a person had no sensorysensitivity The ideal behavior of a testing method wouldbe to indicate that the threshold was at the highest valuethat is possible with that method or to have some otherexplicit way of indicating that the subject is respondingrandomly

None of the three methods performed very well in de-tecting the random responding The median errors andnumber of trials are given in Table 2 The upndashdown stair-case method did the poorest because it gives a medianerror of 93 This rather low error occurs because on87 of the cases the stopping criterion of 5 reversalswas satisfied before the method reached the upper limitof the stimulus concentrations The ML-PEST methodfailed on 11 of the cases but overall the thresholdswere estimated to be very high Unfortunately it re-quired a great number of trials to achieve this perfor-mance The ascending method of limits reached theupper limit of stimulus concentration on 66 of thecases and also gave the highest values for the threshold

Since in Monte Carlo simulations the distribution ofthe actual thresholds is known we can apply an SDT analy-sis (Green amp Swets 19661974 Macmillan amp Creelman1991) to compare the distribution of estimated thresh-olds made by each method under random respondingwith the true distribution Using the means and standard

deviations of the above distributions we computed thesignal detection measures of sensitivity da and accuracythe area under the ROC curve Az (Harvey 1992 Simp-son amp Fitter 1973) The ML-PEST method had the high-est sensitivity and accuracy (da = 227 Az = 95) The as-cending method was next best (da = 199 Az = 92) andthe upndashdown staircase method was the least sensitive(da = 150 Az = 86)

DiscussionThe results of the simulations lead to the conclusion that

both the upndashdown staircase method and the maximum-likelihood method have considerable strengths for use inmeasuring taste and smell thresholds The ascendingmethod of limits should not be used if one is interestedin obtaining an accurate estimate of the threshold Thesethreshold estimates covary with the number of trialstaken to stop as is illustrated in the lower panel of Fig-ure 7 The simulation results offer an explanation for thepublished finding that ascending limit measurement ofolfactory thresholds are less reliable than staircase mea-sured thresholds (Doty McKeown Lee amp Shaman1995)

The upndashdown staircase method gives an average errortwice as large as that of the ML-PEST method althoughboth values are small and therefore similar to eachother The upndashdown staircase method failed by callingfor a stimulus too low on 66 out of 10000 cases whereasthe ML-PEST method had no such failures Althoughthe median number of trials gives the edge to the ML-PEST method on some occasions the ML-PEST methodrequired over 50 trials to reach its stopping criterion

A strength of the ML-PEST method is its ability toconsider alternate hypotheses about the response strat-egy of the observer The ML-PEST method was able tocorrectly detect all 10000 dishonest responders We arecurrently preparing a paper that will go into considerabledetail on how to distinguish malingerers from true anos-mics using a variety of methods Another strength is thatthere is virtually no bias introduced into threshold mea-surements by having the ML-PEST steepness differentfrom the ldquotruerdquo value set in the Monte Carlo observer(Treutwein amp Strasburger 1999) Data generated from aMonte Carlo Weibull function observer can be well fit byvirtually any s-shaped psychometric function althoughthe Weibull will usually fit slightly better We thereforerecommend the ML-PEST method for testing taste andsmell

EXPERIMENT 2TestndashRetest Reliability

In order to assess the stability of the measures ob-tained with the adaptive maximum-likelihood procedurewe tested the subjects on four occasions

MethodSubjects Twenty-seven nonsmoking healthy subjects partici-

pated in the experiment (mean age = 316 plusmn 87 years range =

1342 LINSCHOTEN HARVEY ELLER AND JAFEK

22ndash57 years) The 14 women and 13 men were divided into twogroups Group 1 (ldquoconsecutive rdquo 7 females and 7 males) was testedon 4 consecutive days and Group 2 (ldquodistributed rdquo 7 females and 6males) was tested on Days 1 2 8 and 22

Stimuli Smell thresholds for butyl alcohol were assessed usingthe same range of concentrations as were used with the maximum-likelihood method in Experiment 1 Taste thresholds were measuredusing 20 concentrations of NaCl in quarter-log steps The highestconcentration was 17 M The same maximum-likelihood adaptivemethod as described in Experiment 1 was used in this experiment

Procedure Thresholds were determined with a temporal 2AFCprocedure Taste sensitivity was measured with a whole-mouth sip-and-spit procedure The stimuli and the blanks were presented in30-ml plastic medicine cups containing 5 ml of liquid The sub-jects were requested to take the f irst sample in their mouths swishit around and spit it out then rinse with distilled water and do thesame with the second sample They were asked to choose the sam-ple that had a taste different from water In each of the four sessionstaste thresholds were determined first and smell thresholds last Allfour smell thresholds were determined in the preferred nostril Thesubjects were tested at the same time of each day

ResultsBecause the measurement and specification of testndash

retest reliability is a rather complex topic (Linn 1989)

we have chosen four different ways to demonstrate thereliability of the maximum-likelihood method For thefirst method the four threshold measurements for eachsubject are plotted in Figure 8 The thick line in eachpanel is the mean threshold It is clear from examinationof Figure 8 that most individual thresholds were stableover 4 days (left column of panels) and over 22 days(right column of panels) with a few subjects showingshifts of more than a 1 log unit The mean thresholds forthe two groups are shown in Table 3 A repeated mea-sures ANOVA confirmed the stability of the thresholdsThe effect of testing day was not significant

A second approach is to compute the correlationsamong the four threshold measurements (see eg Mattes1988) Pearson correlation coefficients between thresh-olds obtained on different days are shown in Table 4 Thecorrelations were generally higher for pairs of thresholdsthat were measured close together in time As Mattespointed out correlations are highly influenced by smallchanges in ordinal ranking of the subjects The fact thatthe pairwise correlations became smaller as time sepa-ration between them increased is shown in Figure 9 The

Figure 8 Log threshold concentration as a function of test day for NaCl (upper row) and butanol (lower row) measured on 4 suc-cessive days (left column) and over 22 days (right column) The thin lines connect the four thresholds of individual subjects The thickline is the mean threshold concentration of all subjects

FAST AND ACCURATE THRESHOLDS 1343

solid line is the least squares regression line It indicatesthat the correlation between two thresholds diminishedas the time between them increased The slope of the re-gression line becomes even steeper if the data for 1-dayseparation are removed from the regression indicatingthat the decrease in correlation started with the 2-dayseparation condition

The third method to assess testndashretest reliability isbased on the differences between successive thresholdmeasurements Each threshold was subtracted from theprevious threshold The four threshold measurements re-sulted in three difference measurements The distribu-tions of these differences for NaCl and for butyl alcoholare plotted on the left side of Figure 10 If the measuredthresholds do not systematically increase or decreaseover time we expect that the mean difference would bezero The mean of the distribution for NaCl (upper panelof Figure 10) was 0018 and for butyl alcohol (lowerpanel) was 0002 Neither of these was significantly dif-ferent from zero The standard deviations of these distri-butions were 0413 log units for NaCl and 0422 log unitsfor butyl alcohol

A fourth way to examine testndashretest reliability is bymeans of the spread of the four threshold measures foreach subject A standard deviation of 00 would meanthat all four thresholds were exactly the same value Thestandard deviation of the four measurements was com-puted for each subject The distribution of these standarddeviations for NaCl (upper panel) and butyl alcohol(lower panel) are plotted on the right side of Figure 10The median standard deviation was about 015 log unitsfor both NaCl and butyl alcohol This good testndashretestreliability is consistent with the other three methods ofassessing reliability reported above

DiscussionThe maximum-likelihood method gives estimates of

thresholds that are stable within individuals over timeand that are responsive to individual differences By allfour indicators the testndashretest reliability of both NaCland butyl alcohol thresholds measured by the maximum-likelihood method was very good

GENERAL DISCUSSION

The two experiments and the Monte Carlo simulationsreported here demonstrate that the maximum-likelihoodmethod is a viable alternative to other methods currentlyin use to measure taste and smell detection thresholdsBecause the method attempts to use stimuli that are op-timal (close to the predicted threshold) the confidenceintervals of the measured thresholds are considerablysmaller than those achieved by the other two methods(Experiment 1)

One of the strengths of the maximum-likelihoodmethod is that it provides an explicit way of estimatingthe confidence interval of the threshold a facility notprovided by the other two methods The experimenter isfree to establish a stopping confidence interval to suithisher needs If greater precision is desired it can beachieved at the expense of running more trials This ex-plicit tradeoff between precision and number of trials isnot possible with the other two psychophysical methodsOne could of course use a hybrid combination of meth-ods an upndashdown staircase method for generating the nextstimulus to use combined with a maximum-likelihoodmethod for estimating the threshold and its confidenceinterval In effect we used this approach after the datafrom Experiment 1 were collected by reanalyzing them

Table 3Mean NaCl and Butyl Alcohol Thresholds (and Standard Errors) From Experiment 2

Day 1 Day 2 Day 3 Day 4

M SE M SE M SE M SE

NaClConsecutive (Group 1) 262 013 267 011 282 013 263 011Distributed (Group 2) 290 019 283 009 283 011 277 010

Butyl AlcoholConsecutive (Group 1) 319 014 317 015 325 012 310 013Distributed (Group 2) 300 016 310 014 309 010 309 011

Table 4Pearson Correlation Coefficients Between Thresholds

Paired Sessions

1ndash2 1ndash3 1ndash4 2ndash3 2ndash4 3 ndash 4

NaClConsecutive (Group 1) 46 66 66 44 75 66Distributed (Group 2) 58 72 40 55 20 51Butyl alcoholConsecutive (Group 1) 86 83 86 67 71 76Distributed (Group 2) 26 68 56 51 21 65

p lt 05 p lt 01 p lt 001

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 4: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

FAST AND ACCURATE THRESHOLDS 1333

likelihood fitting technique (Harvey 1986 1997) Evenin the extreme case of 1 presentation per stimulus thebest-fitting logistic function gives a good estimate of thethreshold

Our goal in the present paper is to introduce the use ofthe maximum-likelihood adaptive staircase procedurefor measuring sensitivity in the chemical senses We willcompare the efficiency and the accuracy of this methodwith those of other methods and present testndashretest dataas an assessment of reliability

MAXIMUM-LIKELIHOOD ADAPTIVESTAIRCASE PSYCHOPHYSICAL METHOD

The principles of this method are described in a vari-ety of sources (Hall 1968 1981 Harvey 1986 1997Hays 1963 Watson amp Pelli 1983) which provide moredetail than is presented here For this discussion it is as-sumed that the psychometric function is logistic and thatdata are collected with a 2AFC experimental paradigmalthough other functions and paradigms could certainlybe used During psychophysical testing there are threemajor questions that are answered by this method aftereach trial (1) What is the current estimate of the thresh-old (2) What stimulus should be presented to the sub-ject on the next trial (3) Should the psychophysical tri-als be stopped Although the answers are somewhatinterrelated it is useful to consider them separately

Estimation of the ThresholdDuring psychophysical testing the raw data collected

on each trial are the stimulus concentration presented tothe subject and whether or not the subject made a correctresponse In order to estimate the threshold a series ofpossible psychometric functions is considered A likeli-hood function is constructed by computing the log like-lihood for each of these possible psychometric functionseach one having the same value of b (steepness) and g(the probability of being correct by chance) and eachhaving a different value of a The computational proce-dure is described by Harvey (1986 1997) The smallestcandidate a is chosen to lie below any possible realthreshold or at least lower than the lowest physical stim-ulus Successive values increase in 001-log-unit stepsup to an appropriate maximum In testing the detectionof NaCl for example we consider 451 as ranging from

450 log molar concentration (log M) to 000 log molarconcentration in 001-log steps Each candidate logisticfunction has a fixed b (35) and a fixed g (05) for the2AFC psychophysical paradigm The b value is not crit-ical for converging on the correct value of b The valueof 35 is close to actual psychometric functions for NaClmeasured with the method of constant stimuli The like-lihood functions after having run 0 5 10 15 20 and 25trials on a Monte Carlo observer with a true value of aof 2125 log M (00075 M) are plotted in the upperpanel of Figure 3 The best estimate of the threshold is

that value of a having the maximum likelihood (ie themean of the likelihood distribution) In this simulationthe best estimate of a was 2110 log M after 25 trialsan error of 0015 log M

Choosing the Next StimulusAt the beginning of an experiment before any data have

been collected all candidate as have equal likelihood ofbeing the threshold But often the experimenter has someprior idea of what the threshold value should be espe-cially if the subject has normal sensory function We in-corporate these prior expectations into the method bykeeping track of a second likelihood function that startswith the prior likelihoods and adds the results of each trialto it We use a Gaussian distribution for the prior likeli-hood function with a standard deviation of 040 which isthe value we found in our testndashretest experiment to be re-ported below (Experiment 2) In the example the initialestimate of the threshold was 1875 log M These pos-terior likelihood functions are plotted in the lower panel ofFigure 3 after having run 0 5 10 and 15 trials on the sameobserver The mode of this posterior likelihood functionafter each trial is used to choose the next stimulus

In taste and smell it is usually not feasible to maintainmore than 20 or so different stimulus concentrations forpresentation to an observer The stimulus recommendedby the maximum of the posterior likelihood functiontherefore may not actually be available to the experi-menter to use on the next trial There will be an actualstimulus that is lower than the recommendation and an-other stimulus that is higher than the recommendationTo choose between these two we compute relative dis-tance of the recommended stimulus between the two ac-tual stimuli and we compute the relative frequency ofeach of the two actual stimuli We choose one randomlywith a probability that favors the stimulus closer to therecommendation and the stimulus with the fewer numberof trials This computation is done in CStim-gtGetBest-StimulusIndex() in the ML-PEST software packageKeeping the list of candidate as separate from the list ofavailable stimuli means that the threshold a can be com-puted to a precision that is not limited by the small num-ber of actual stimuli used in the experiment

For NaCl detection we use 18 different concentrationsranging from 400 log M to 025 log M in 025-log-unitsteps The actual stimulus concentrations that were pre-sented to the Monte Carlo observer are plotted in Fig-ure 4 for each of the 25 trials The solid circles representtrials on which the observer was correct the open circlesare incorrect trials Because the observer was correct onthe first three trials the stimuli that were presented weresuccessively weaker concentrations On Trial 4 an errorwas made and the stimulus on the following trial wasthe same concentration Generally but not always (be-cause of the random process of choosing the next stimu-lus) the next stimulus will be weaker after a correct re-sponse and stronger after an incorrect response The

1334 LINSCHOTEN HARVEY ELLER AND JAFEK

maximum-likelihood estimate of the threshold is plottedby the solid line in Figure 4 Until both correct and in-correct trials are collected the estimate of the thresholdwill be at either the lower or the upper end of the rangeof candidate as In this example the observer was cor-rect on the first three trials and then made a mistake onTrial 4 The estimate of the threshold was extremely low(not visible in the figure) until Trial 4 when it jumped upto the region near the true value As the trials continuethe estimated threshold fluctuates around the true valueand eventually converges on it After 25 trials the esti-mated threshold was 211 log M an error of 0015

log units This low error is achieved even though no ac-tual stimulus corresponding to the threshold was avail-able for testing

When to StopTwo stopping criteria can be used to terminate psy-

chophysical testing stop after a fixed number of trials orstop after the estimation of the threshold reaches a re-quired level of precision Precision is measured by theconfidence interval which is related to the width of thelikelihood function Notice in Figure 3 that the likeli-hood function becomes narrower as the number of trials

Figure 3 Log likelihood (upper panel) and log posterior likelihood (lower panel) as a function ofcandidate a following 0 5 10 15 20 and 25 Monte Carlo trials with a logistic observer having atrue a of 00075 M ( 2125 log M) detecting NaCl in a two-alternative forced-choice detectionparadigm

FAST AND ACCURATE THRESHOLDS 1335

increases The confidence interval of the estimatedthreshold is directly related to the width of the likelihooddistribution and is calculated using the likelihood ratiotest and the c2 statistical distribution with 1 degree offreedom (Hays 1963)

(2)

where Lx is the likelihood of a model whose a parameteris x and Lmax is the likelihood of the best-fitting modelWe search for the stimulus concentrations (x) above andbelow the threshold stimulus (the maximum-likelihooda) at which the log likelihood drops to c22 below themaximum log likelihood value To find the 95 confi-dence interval the criterion value of c2 is 50239 (thisvalue is obtained from standard tables of the c2 distribu-tion found in all statistic text books The one-tailed valueof 50239 corresponds to a probability of 025 since theconfidence interval is two-tailed 50239 corresponds toa probability of 05)

In Figure 5 the same data shown in Figure 4 are plot-ted with an expanded vertical scale to show the estimateof the threshold along with the estimated 95 confi-dence interval Notice that as the number of trials in-creases the conf idence interval of the threshold de-creases After 25 trials the 95 confidence interval ofthe threshold extends 027 log units below the estimateof the threshold (indicated by the solid horizontal linenot by the filled circle which represents the stimulus

presented on that trial) and 037 log units above it for atotal confidence interval of 064 log units

EXPERIMENT 1Comparison of Three Psychophysical Methods

The goal of Experiment 1 was to compare the ML-PEST method with two commonly used methods withrespect to their efficiency and their precision Detectionthresholds for butyl alcohol were measured using threemethods the ascending method of limits as described by Cain et al (1983) the upndashdown staircase method(Wetherill amp Levitt 1965) and the ML-PEST maximum-likelihood staircase method as described above (Harvey1986 1997)

MethodSubjects Thirty-two (16 males and 16 females) healthy non-

smoking subjects participated in the experiment Their mean age was304 years (plusmn 81) ranging from 17 to 60 years None reported hav-ing conditions that are normally associated with smell dysfunction

Stimuli for ascending method of limits For the ascendingmethod of limits the procedure described by Cain et al (1983) wasused Thirteen concentrations of butyl alcohol were prepared withdistilled water as diluent The strongest concentration was 4 Ad-jacent steps differed by a factor of 3

Stimuli for staircase methods The Wetherill and Levitt andML-PEST methods used a series of 24 concentrations of butyl al-cohol The lowest 20 concentrations were a factor of 15 apart withthe strongest 01 This series was augmented by the four strongestof the series for the ascending method which were separated by afactor of 3

Procedure Thresholds were determined with a temporal 2AFCprocedure In each trial a stimulus was paired with a blank and the

c 2 2 0 2 0= times = timesaeligegraveccedil

oumloslashdivide

log ( ) log max

e exL

Llikelihood ratio

Figure 4 Stimulus presentation as a function of trial number in the two-alternative forced-choice Monte Carlo simulation Trials on which the subjectrsquos response was correct are repre-sented by the filled circles trials on which the subjectrsquos response was incorrect are representedby the open circles The resulting maximum-likelihood estimate of a is plotted by the solid lineThe horizontal dotted line marks the true value of a

1336 LINSCHOTEN HARVEY ELLER AND JAFEK

subjects were asked to select the bottle containing the stimulus Thesubjects were asked to give their best guess even when they werecertain that the two stimuli were identical The stimulus and theblank were presented in 250-ml squeezable polyethylene bottlescontaining 60 ml of liquid The subjects were instructed to keep onenostril closed while the other nostril was tested with three puffs outof each bottle For each subject four thresholds were obtained twofor each nostril In randomized order and alternating nostrils onethreshold was determined with the ascending method one with theupndashdown method and two with the adaptive maximum-likelihoo dmethod Testing took about 45 min

The definition of detection threshold differs in several aspectsamong the three methods In the ascending method testing startedwith the lowest concentration A correct response led to the pre-sentation of the same concentration on the next trial When an in-correct response was given the next higher concentration was usedTesting stopped when five correct answers had been given in a rowThe threshold was defined as the last-used concentration (Cain et al1983) and thus corresponded to a detection probability of 10

In the upndashdown staircase method testing could start at any stepin the concentration series The first subject was started in the mid-dle of the range and each subsequent subject started where the for-mer subject had ended After one correct response the same con-centration was presented if two correct responses were given thenext lower concentration was presented on the next trial One in-correct response led to the presentation of a concentration one stephigher This decision rule is called the one-upndashtwo-down rule Al-though this decision rule is widely used the actual threshold esti-mate depends more on the size of the stimulus step than on the par-ticular upndashdown rule (Garcia-Perez 1998) Testing stopped afterfive reversals and the threshold was computed as the average of the

stimulus concentrations at the last four reversals This thresholdcorresponded to a detection probability of approximately 75

In the maximum-likelihood adaptive staircase method testingstarted in the middle of the stimulus range After each response theestimate of the threshold was calculated and the next stimulus wasa concentration close to that threshold Testing stopped when theconfidence interval of the estimated threshold reached a predefinedcriterion (08 log concentration units) This threshold corresponde dto a detection probability of exactly 75 All these computationswere carried out in real time on a Power Macintosh computer usinga program SensoryTester written by author LOH

Data analysis The mean threshold (and standard error) pro-duced by each method is presented in Table 1 labeled ldquoNativeThresholdrdquo As noted above each method produces a threshold es-timate that corresponds to a different performance level of detec-tion To obtain threshold and confidence interval estimates for theascending method of limits and the upndashdown staircase method thatcould be directly compared with those obtained by the ML-PESTmethod a logistic psychometric function was fitted to the actual se-quence of stimulus presentations and responses from these twomethods The values of b and g were held constant at the same valuesused in the ML-PEST method The maximum-likelihood thresholdand the confidence interval were then computed in exactly the samemanner as with the ML-PEST method This f itting procedure willgive a lower threshold for the ascending method of limits data thanthat reported by the method of limits procedure itself because theascending method of limits looks for a threshold stimulus giving100 correct whereas the maximum-likelihood threshold corre-sponds to 75 correct A repeated measures analysis of variance(ANOVA) was computed using StatView 501 (SAS Institute 1998a1998b) on each of four dependent variables native log threshold

Figure 5 Stimulus presentation as a function of trial number in the two-alternative forced-choiceMonte Carlo simulation Trials on which the subjectrsquos response was correct are represented by thefilled circles trials on which the subjectrsquos response was incorrect are represented by the open cir-cles The resulting maximum-likelihood estimate of a is plotted by the solid line The horizontaldotted line marks the true value of a The vertical bars show the 95 confidence interval of eachestimate of a

FAST AND ACCURATE THRESHOLDS 1337

number of trials to reach the native threshold maximum-likelihoo dlogistic threshold and confidence interval of the maximum-likelihoo dthreshold

ResultsThe native threshold values the mean numbers of tri-

als needed to reach the stopping criterion the maximum-likelihood logistic thresholds and the mean confidenceintervals of the maximum-likelihood thresholds are sum-marized in Table 1 A repeated measures ANOVA on thenative thresholds showed no significant main effect ofmethod [F(331) = 161 p lt 187] Subsequent statisticalcontrast comparisons using the standard HuynhndashFeldt eto correct the p value and the degrees of freedom re-vealed that the native threshold of the ascending methodof limits was just signif icantly higher than the othermethods [F(131) = 392 p = 051] These higher thresh-olds are to be expected given that the ascending methodof limits computes the threshold to be the concentrationcorresponding to a detection probability of 10 whereasthe other methods use a performance criterion of about 75

A repeated measures ANOVA on the number of trialsneeded to reach the stopping criterion showed a signifi-cant main effect of method [F(331) = 626 p lt 0008]Statistical contrast comparisons showed that the ascend-ing method required significantly fewer trials to deter-mine threshold [F(131) = 641 p lt 014] and that theupndashdown method required signif icantly more trials[F(131) = 606 p lt 017] than the average of the twomeasurements with the maximum-likelihood method

A repeated measures ANOVA on the maximum-like-lihood logistic thresholds showed no significant main ef-fect of method and the thresholds computed from thetrial-by-trial data from the ascending method of limitswere not significantly different from the thresholds com-puted from the data collected with the other methods[F(131) = 177 p = 19] This result is to be expectedsince presumably the performance of each observer isdriven by the same sensory process when tested by eachof the three psychometric methods

The mean confidence intervals for the thresholds (andstandard errors) are shown in the last column of Table 1The mean confidence interval of the maximum-likelihoodmethod is smaller than those of the other two Note thatthe standard error of the confidence interval reported forthe ML-PEST method is very small (002) This is be-cause the confidence interval was used as the stopping

criterion Trials were stopped when the confidence in-terval of a reached 08 A repeated measures ANOVAindicated that there was a signif icant main effect ofmethod [F(331) = 1727 p lt 0002] Contrasts tests re-vealed that the confidence intervals for the upndashdownmethod were significantly larger than those for both themaximum-likelihood method [F(131) = 5093 p lt0001] and the ascending method [F(131) = 2372 p lt0016]

DiscussionThe native absolute threshold values obtained with the

ascending method were higher than those obtained withthe other two methods One should remember that thesevalues are not conventional thresholds in the sense thatthey represent the concentration chosen correct five outof five times It is to be expected that this value will bea higher concentration than thresholds based on a lowerpercent correct Furthermore this method at least as de-scribed by Cain et al (1983) has considerably larger stepsbetween adjacent concentrations and only actual stimu-lus concentrations can be threshold values

The ascending method needed the fewest number oftrials (~13) relative to either the maximum-likelihoodmethod (~15) or the upndashdown staircase method (~17)The clear superiority of the maximum-likelihood methodhowever came with the high precision of the measuredthresholds as indicated by the small confidence inter-vals The confidence intervals of the upndashdown staircasewere almost three times larger than those given by themaximum-likelihood method

MONTE CARLO SIMULATIONS

When measuring the threshold of an actual subject itis not possible to know what the error is between themeasured value and the ldquotruerdquo value because the truevalue is not known In order to evaluate the accuracy ofthe three psychophysical methods we resorted to MonteCarlo evaluations of simulated observers having a pre-determined threshold value We used each of the threepsychophysical methods to measure the threshold ofeach simulated observer and compared the result withthe true value

Monte Carlo simulations were run for each of thethree psychophysical procedures using the same rulesand stimulus series as were used in the threshold mea-

Table 1Mean Performance (and Standard Error) of Three Psychophysical Methods

With Butyl Alcohol in Experiment 1

Native Number of ML Logistic Confidence Threshold Trials Threshold Interval

Method M SE M SE M SE M SE

ML-PESTLeft 341 016 1513 065 341 016 080 002Right 327 013 1541 076 327 013 084 002

WetherillndashLevitt 330 010 1744 072 330 011 232 034Ascending limits 310 013 1303 068 348 014 114 002

1338 LINSCHOTEN HARVEY ELLER AND JAFEK

surements for butyl alcohol with the real subjects re-ported above in Experiment 1 A population of 10000simulated subjects was defined whose thresholds fol-lowed a Gaussian distribution with a mean threshold of

300 log percent concentration and a standard devia-tion of 04 log units These values are representative ofthe distribution of real subjectsrsquo detection thresholds forbutyl alcohol as found in Experiment 2 below The pre-determined threshold for each of the 10000 simulatedobservers was drawn from the above distributionThreshold values higher than the highest stimulus con-centration and lower than the lowest stimulus concentra-tion were not used

Simulation 1On each trial the simulated subject was presented

with a stimulus Each subjectrsquos detection behavior wasdriven by a logistic psychometric function with a steep-ness of 35 and a chance level performance of 05 Theprobability of making the correct response on a trial wasdetermined by the probability predicted by the logisticfunction for the stimulus and for that subject and a uni-form random number between 0 and 1 If the randomnumber was less than the logistic function probability forthat stimulus the response was scored as correct other-wise it was scored as wrong The starting stimulus forthe upndashdown staircase and maximum-likelihood meth-ods was chosen randomly between the two stimuli thatbracketed the mean threshold of 30 log concentrationThese stimuli were 3113 and 2937 log concentra-tion The ascending method started with the lowest stim-ulus in its series 5130 log percent concentration

An ascending limits run was considered a failure if theprocedure called for a stimulus stronger than the higheststimulus concentration A WetherillndashLevitt staircase runwas considered a failure if during the trials a stimulusstronger than the highest stimulus concentration orweaker than the lowest stimulus concentration was calledfor In the present simulation the WetherillndashLevitt stair-

case failed 66 times All of these failures were caused bythe procedurersquos calling for a lower stimulus concentra-tion than the lowest available An ML-PEST run wouldhave been considered a failure if the final estimate of thethreshold were more than half the stopping confidenceinterval below the lowest candidate or above the highestcandidate a There were no such failures The failed runsare excluded from the data analyzed below

The distribution of errors for each method (estimatedthreshold minus true threshold) for the 10000 subjectsis shown in the left side of Figure 6 The distribution ofthe number of trials to reach stopping criterion for the10000 subjects is shown on the right side of Figure 6The maximum-likelihood method was more accuratethan the WetherillndashLevitt staircase and both of thesemethods were superior to the ascending method of lim-its The median error and the number of trials requiredfor each method are shown in Table 2 The median errorfor the maximum-likelihood method was almost half aslarge as that for the upndashdown staircase although both er-rors were quite small

The error on each of the 10000 subjects is plotted inFigure 7 as a function of the number of trials for that runThe range of errors became smaller as the number of tri-als increased for both the WetherillndashLevitt staircasemethod and the ML-PEST method and the median errorstayed close to zero The ascending staircase methodshowed quite different behavior The direction of theerror depended on the number of trials before the stop-ping criterion of five correct in a row had been reachedThe estimated thresholds were too low when the numberof trials was less than about 15 whereas the estimates ofthreshold were too high when the number of trials wasmore than 15

Simulation 2Because the WetherillndashLevitt staircase method and the

ML-PEST method generally required different numbers oftrials before the stopping criterion was reached on eachrun comparison of the accuracy is not straightforwardWe repeated the Monte Carlo simulations so that the stair-case method and the ML-PEST method used the samenumber of trials on each run First the WetherillndashLevittstaircase was run until the stopping criterion wasreached Then the ML-PEST method was run with thesame number of trials on that observer using the samestarting stimulus

The results are presented in Table 2 When the ML-PEST method had the same number of trials as theupndashdown staircase the ML-PEST method was slightlymore accurate As in Simulation 1 both of these meth-ods were superior in accuracy to the ascending staircaseThere were 60 failures out of the 10000 cases where theupndashdown staircase called for a stimulus that was lowerthan the lowest available concentration Because theML-PEST method was not permitted to run as many tri-als as it would normally require this method failed on 30out of the 10000 cases

Table 2Results of the Four Monte Carlo Simulations

Method Median Error Median Number of Trials

Simulation 1Maximum-likelihood 0027 16WetherillndashLevitt 0045 17Ascending limits 0331 14

Simulation 2Maximum-likelihood 0094 17WetherillndashLevitt 0101 17Ascending limits 0419 14

Simulation 3Maximum-likelihood 003 16WetherillndashLevitt 360 16Ascending limits 354 16

Simulation 4Maximum-likelihood 275 115WetherillndashLevitt 093 21Ascending limits 336 22

FAST AND ACCURATE THRESHOLDS 1339

Simulation 3An important ability of sensory testing is to detect

people who deliberately try to appear to have diminishedsensitivity We therefore repeated the Monte Carlo sim-ulations of 10000 subjects using the same starting con-ditions and the same test conditions as those in Simula-tion 1 but each simulated subject generated ldquountruthfulrdquoresponses (ie responses just the opposite of that calledfor by the subjectrsquos psychometric function) In the ML-

PEST computer program two likelihood functions aremaintained one based on the hypothesis that the subjectis responding truthfully and the other based on the hy-pothesis that the subject is deliberately choosing thewrong response The program explicitly tests the hy-pothesis that a subject is deliberately choosing the wrongresponse by comparing the maxima of the two likelihooddistributions and therefore should be able to measure a sensory threshold as effectively as when a subject is

Figure 6 Distribution of errors (left) and number of trials (right) for three psychophysical methods maximum-likelihoodadaptive staircase (upper panel) Wetherill and Levitt upndashdown staircase (middle panel) and ascending method of limits (lowerpanel) The vertical line in the left panels marks the zero-error bin of each histogram See text for details The abscissa scalefor the upper two left panels covers a range of 15 log units and is 5 log units for the lower left histogram

1340 LINSCHOTEN HARVEY ELLER AND JAFEK

Figure 7 Error as a function of number of trials for three psychophysicalmethods ML-PEST (upper panel) upndashdown staircase (middle panel) and as-cending method of limits (lower panel)

FAST AND ACCURATE THRESHOLDS 1341

telling the truth The other two methods do not have anexplicit way of dealing with this situation It is of inter-est to learn how all three methods behave under thesecircumstances

The results of this simulation are presented in Table 2The ML-PEST method successfully measured the thresh-old and explicitly detected the lying condition in all10000 cases and was therefore able to measure thethreshold of the underlying psychometric function witha small median error The upndashdown staircase method andthe ascending method of limits reached the highest con-centration and tried to go higher resulting in 9824 fail-ures for the staircase method and 8648 failures for themethod of limits If we consider a failure as a successfuldetection of the lying behavior the upndashdown staircasemethod detected 98 of the cases and the ascendingmethod of limits detected 86 of the cases These twomethods required about the same number of trials (seeTable 2)

Simulation 4It is important in clinical testing that people with ele-

vated thresholds be easily detected One of the subjects inExperiment 1 f irst had smell tested by the upndashdownmethod and exhibited a normal threshold Later the samenostril was tested using the maximum-likelihood methodand no threshold could be measured because as it turnedout he was anosmic on that side The fact that the upndashdown method did not detect the anosmia is of concern

We therefore repeated the Monte Carlo simulations of10000 subjects using the same starting conditions andthe same test conditions as those in Simulation 1 withone change Each simulated subject generated a randomresponse as would be the case if a person had no sensorysensitivity The ideal behavior of a testing method wouldbe to indicate that the threshold was at the highest valuethat is possible with that method or to have some otherexplicit way of indicating that the subject is respondingrandomly

None of the three methods performed very well in de-tecting the random responding The median errors andnumber of trials are given in Table 2 The upndashdown stair-case method did the poorest because it gives a medianerror of 93 This rather low error occurs because on87 of the cases the stopping criterion of 5 reversalswas satisfied before the method reached the upper limitof the stimulus concentrations The ML-PEST methodfailed on 11 of the cases but overall the thresholdswere estimated to be very high Unfortunately it re-quired a great number of trials to achieve this perfor-mance The ascending method of limits reached theupper limit of stimulus concentration on 66 of thecases and also gave the highest values for the threshold

Since in Monte Carlo simulations the distribution ofthe actual thresholds is known we can apply an SDT analy-sis (Green amp Swets 19661974 Macmillan amp Creelman1991) to compare the distribution of estimated thresh-olds made by each method under random respondingwith the true distribution Using the means and standard

deviations of the above distributions we computed thesignal detection measures of sensitivity da and accuracythe area under the ROC curve Az (Harvey 1992 Simp-son amp Fitter 1973) The ML-PEST method had the high-est sensitivity and accuracy (da = 227 Az = 95) The as-cending method was next best (da = 199 Az = 92) andthe upndashdown staircase method was the least sensitive(da = 150 Az = 86)

DiscussionThe results of the simulations lead to the conclusion that

both the upndashdown staircase method and the maximum-likelihood method have considerable strengths for use inmeasuring taste and smell thresholds The ascendingmethod of limits should not be used if one is interestedin obtaining an accurate estimate of the threshold Thesethreshold estimates covary with the number of trialstaken to stop as is illustrated in the lower panel of Fig-ure 7 The simulation results offer an explanation for thepublished finding that ascending limit measurement ofolfactory thresholds are less reliable than staircase mea-sured thresholds (Doty McKeown Lee amp Shaman1995)

The upndashdown staircase method gives an average errortwice as large as that of the ML-PEST method althoughboth values are small and therefore similar to eachother The upndashdown staircase method failed by callingfor a stimulus too low on 66 out of 10000 cases whereasthe ML-PEST method had no such failures Althoughthe median number of trials gives the edge to the ML-PEST method on some occasions the ML-PEST methodrequired over 50 trials to reach its stopping criterion

A strength of the ML-PEST method is its ability toconsider alternate hypotheses about the response strat-egy of the observer The ML-PEST method was able tocorrectly detect all 10000 dishonest responders We arecurrently preparing a paper that will go into considerabledetail on how to distinguish malingerers from true anos-mics using a variety of methods Another strength is thatthere is virtually no bias introduced into threshold mea-surements by having the ML-PEST steepness differentfrom the ldquotruerdquo value set in the Monte Carlo observer(Treutwein amp Strasburger 1999) Data generated from aMonte Carlo Weibull function observer can be well fit byvirtually any s-shaped psychometric function althoughthe Weibull will usually fit slightly better We thereforerecommend the ML-PEST method for testing taste andsmell

EXPERIMENT 2TestndashRetest Reliability

In order to assess the stability of the measures ob-tained with the adaptive maximum-likelihood procedurewe tested the subjects on four occasions

MethodSubjects Twenty-seven nonsmoking healthy subjects partici-

pated in the experiment (mean age = 316 plusmn 87 years range =

1342 LINSCHOTEN HARVEY ELLER AND JAFEK

22ndash57 years) The 14 women and 13 men were divided into twogroups Group 1 (ldquoconsecutive rdquo 7 females and 7 males) was testedon 4 consecutive days and Group 2 (ldquodistributed rdquo 7 females and 6males) was tested on Days 1 2 8 and 22

Stimuli Smell thresholds for butyl alcohol were assessed usingthe same range of concentrations as were used with the maximum-likelihood method in Experiment 1 Taste thresholds were measuredusing 20 concentrations of NaCl in quarter-log steps The highestconcentration was 17 M The same maximum-likelihood adaptivemethod as described in Experiment 1 was used in this experiment

Procedure Thresholds were determined with a temporal 2AFCprocedure Taste sensitivity was measured with a whole-mouth sip-and-spit procedure The stimuli and the blanks were presented in30-ml plastic medicine cups containing 5 ml of liquid The sub-jects were requested to take the f irst sample in their mouths swishit around and spit it out then rinse with distilled water and do thesame with the second sample They were asked to choose the sam-ple that had a taste different from water In each of the four sessionstaste thresholds were determined first and smell thresholds last Allfour smell thresholds were determined in the preferred nostril Thesubjects were tested at the same time of each day

ResultsBecause the measurement and specification of testndash

retest reliability is a rather complex topic (Linn 1989)

we have chosen four different ways to demonstrate thereliability of the maximum-likelihood method For thefirst method the four threshold measurements for eachsubject are plotted in Figure 8 The thick line in eachpanel is the mean threshold It is clear from examinationof Figure 8 that most individual thresholds were stableover 4 days (left column of panels) and over 22 days(right column of panels) with a few subjects showingshifts of more than a 1 log unit The mean thresholds forthe two groups are shown in Table 3 A repeated mea-sures ANOVA confirmed the stability of the thresholdsThe effect of testing day was not significant

A second approach is to compute the correlationsamong the four threshold measurements (see eg Mattes1988) Pearson correlation coefficients between thresh-olds obtained on different days are shown in Table 4 Thecorrelations were generally higher for pairs of thresholdsthat were measured close together in time As Mattespointed out correlations are highly influenced by smallchanges in ordinal ranking of the subjects The fact thatthe pairwise correlations became smaller as time sepa-ration between them increased is shown in Figure 9 The

Figure 8 Log threshold concentration as a function of test day for NaCl (upper row) and butanol (lower row) measured on 4 suc-cessive days (left column) and over 22 days (right column) The thin lines connect the four thresholds of individual subjects The thickline is the mean threshold concentration of all subjects

FAST AND ACCURATE THRESHOLDS 1343

solid line is the least squares regression line It indicatesthat the correlation between two thresholds diminishedas the time between them increased The slope of the re-gression line becomes even steeper if the data for 1-dayseparation are removed from the regression indicatingthat the decrease in correlation started with the 2-dayseparation condition

The third method to assess testndashretest reliability isbased on the differences between successive thresholdmeasurements Each threshold was subtracted from theprevious threshold The four threshold measurements re-sulted in three difference measurements The distribu-tions of these differences for NaCl and for butyl alcoholare plotted on the left side of Figure 10 If the measuredthresholds do not systematically increase or decreaseover time we expect that the mean difference would bezero The mean of the distribution for NaCl (upper panelof Figure 10) was 0018 and for butyl alcohol (lowerpanel) was 0002 Neither of these was significantly dif-ferent from zero The standard deviations of these distri-butions were 0413 log units for NaCl and 0422 log unitsfor butyl alcohol

A fourth way to examine testndashretest reliability is bymeans of the spread of the four threshold measures foreach subject A standard deviation of 00 would meanthat all four thresholds were exactly the same value Thestandard deviation of the four measurements was com-puted for each subject The distribution of these standarddeviations for NaCl (upper panel) and butyl alcohol(lower panel) are plotted on the right side of Figure 10The median standard deviation was about 015 log unitsfor both NaCl and butyl alcohol This good testndashretestreliability is consistent with the other three methods ofassessing reliability reported above

DiscussionThe maximum-likelihood method gives estimates of

thresholds that are stable within individuals over timeand that are responsive to individual differences By allfour indicators the testndashretest reliability of both NaCland butyl alcohol thresholds measured by the maximum-likelihood method was very good

GENERAL DISCUSSION

The two experiments and the Monte Carlo simulationsreported here demonstrate that the maximum-likelihoodmethod is a viable alternative to other methods currentlyin use to measure taste and smell detection thresholdsBecause the method attempts to use stimuli that are op-timal (close to the predicted threshold) the confidenceintervals of the measured thresholds are considerablysmaller than those achieved by the other two methods(Experiment 1)

One of the strengths of the maximum-likelihoodmethod is that it provides an explicit way of estimatingthe confidence interval of the threshold a facility notprovided by the other two methods The experimenter isfree to establish a stopping confidence interval to suithisher needs If greater precision is desired it can beachieved at the expense of running more trials This ex-plicit tradeoff between precision and number of trials isnot possible with the other two psychophysical methodsOne could of course use a hybrid combination of meth-ods an upndashdown staircase method for generating the nextstimulus to use combined with a maximum-likelihoodmethod for estimating the threshold and its confidenceinterval In effect we used this approach after the datafrom Experiment 1 were collected by reanalyzing them

Table 3Mean NaCl and Butyl Alcohol Thresholds (and Standard Errors) From Experiment 2

Day 1 Day 2 Day 3 Day 4

M SE M SE M SE M SE

NaClConsecutive (Group 1) 262 013 267 011 282 013 263 011Distributed (Group 2) 290 019 283 009 283 011 277 010

Butyl AlcoholConsecutive (Group 1) 319 014 317 015 325 012 310 013Distributed (Group 2) 300 016 310 014 309 010 309 011

Table 4Pearson Correlation Coefficients Between Thresholds

Paired Sessions

1ndash2 1ndash3 1ndash4 2ndash3 2ndash4 3 ndash 4

NaClConsecutive (Group 1) 46 66 66 44 75 66Distributed (Group 2) 58 72 40 55 20 51Butyl alcoholConsecutive (Group 1) 86 83 86 67 71 76Distributed (Group 2) 26 68 56 51 21 65

p lt 05 p lt 01 p lt 001

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 5: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

1334 LINSCHOTEN HARVEY ELLER AND JAFEK

maximum-likelihood estimate of the threshold is plottedby the solid line in Figure 4 Until both correct and in-correct trials are collected the estimate of the thresholdwill be at either the lower or the upper end of the rangeof candidate as In this example the observer was cor-rect on the first three trials and then made a mistake onTrial 4 The estimate of the threshold was extremely low(not visible in the figure) until Trial 4 when it jumped upto the region near the true value As the trials continuethe estimated threshold fluctuates around the true valueand eventually converges on it After 25 trials the esti-mated threshold was 211 log M an error of 0015

log units This low error is achieved even though no ac-tual stimulus corresponding to the threshold was avail-able for testing

When to StopTwo stopping criteria can be used to terminate psy-

chophysical testing stop after a fixed number of trials orstop after the estimation of the threshold reaches a re-quired level of precision Precision is measured by theconfidence interval which is related to the width of thelikelihood function Notice in Figure 3 that the likeli-hood function becomes narrower as the number of trials

Figure 3 Log likelihood (upper panel) and log posterior likelihood (lower panel) as a function ofcandidate a following 0 5 10 15 20 and 25 Monte Carlo trials with a logistic observer having atrue a of 00075 M ( 2125 log M) detecting NaCl in a two-alternative forced-choice detectionparadigm

FAST AND ACCURATE THRESHOLDS 1335

increases The confidence interval of the estimatedthreshold is directly related to the width of the likelihooddistribution and is calculated using the likelihood ratiotest and the c2 statistical distribution with 1 degree offreedom (Hays 1963)

(2)

where Lx is the likelihood of a model whose a parameteris x and Lmax is the likelihood of the best-fitting modelWe search for the stimulus concentrations (x) above andbelow the threshold stimulus (the maximum-likelihooda) at which the log likelihood drops to c22 below themaximum log likelihood value To find the 95 confi-dence interval the criterion value of c2 is 50239 (thisvalue is obtained from standard tables of the c2 distribu-tion found in all statistic text books The one-tailed valueof 50239 corresponds to a probability of 025 since theconfidence interval is two-tailed 50239 corresponds toa probability of 05)

In Figure 5 the same data shown in Figure 4 are plot-ted with an expanded vertical scale to show the estimateof the threshold along with the estimated 95 confi-dence interval Notice that as the number of trials in-creases the conf idence interval of the threshold de-creases After 25 trials the 95 confidence interval ofthe threshold extends 027 log units below the estimateof the threshold (indicated by the solid horizontal linenot by the filled circle which represents the stimulus

presented on that trial) and 037 log units above it for atotal confidence interval of 064 log units

EXPERIMENT 1Comparison of Three Psychophysical Methods

The goal of Experiment 1 was to compare the ML-PEST method with two commonly used methods withrespect to their efficiency and their precision Detectionthresholds for butyl alcohol were measured using threemethods the ascending method of limits as described by Cain et al (1983) the upndashdown staircase method(Wetherill amp Levitt 1965) and the ML-PEST maximum-likelihood staircase method as described above (Harvey1986 1997)

MethodSubjects Thirty-two (16 males and 16 females) healthy non-

smoking subjects participated in the experiment Their mean age was304 years (plusmn 81) ranging from 17 to 60 years None reported hav-ing conditions that are normally associated with smell dysfunction

Stimuli for ascending method of limits For the ascendingmethod of limits the procedure described by Cain et al (1983) wasused Thirteen concentrations of butyl alcohol were prepared withdistilled water as diluent The strongest concentration was 4 Ad-jacent steps differed by a factor of 3

Stimuli for staircase methods The Wetherill and Levitt andML-PEST methods used a series of 24 concentrations of butyl al-cohol The lowest 20 concentrations were a factor of 15 apart withthe strongest 01 This series was augmented by the four strongestof the series for the ascending method which were separated by afactor of 3

Procedure Thresholds were determined with a temporal 2AFCprocedure In each trial a stimulus was paired with a blank and the

c 2 2 0 2 0= times = timesaeligegraveccedil

oumloslashdivide

log ( ) log max

e exL

Llikelihood ratio

Figure 4 Stimulus presentation as a function of trial number in the two-alternative forced-choice Monte Carlo simulation Trials on which the subjectrsquos response was correct are repre-sented by the filled circles trials on which the subjectrsquos response was incorrect are representedby the open circles The resulting maximum-likelihood estimate of a is plotted by the solid lineThe horizontal dotted line marks the true value of a

1336 LINSCHOTEN HARVEY ELLER AND JAFEK

subjects were asked to select the bottle containing the stimulus Thesubjects were asked to give their best guess even when they werecertain that the two stimuli were identical The stimulus and theblank were presented in 250-ml squeezable polyethylene bottlescontaining 60 ml of liquid The subjects were instructed to keep onenostril closed while the other nostril was tested with three puffs outof each bottle For each subject four thresholds were obtained twofor each nostril In randomized order and alternating nostrils onethreshold was determined with the ascending method one with theupndashdown method and two with the adaptive maximum-likelihoo dmethod Testing took about 45 min

The definition of detection threshold differs in several aspectsamong the three methods In the ascending method testing startedwith the lowest concentration A correct response led to the pre-sentation of the same concentration on the next trial When an in-correct response was given the next higher concentration was usedTesting stopped when five correct answers had been given in a rowThe threshold was defined as the last-used concentration (Cain et al1983) and thus corresponded to a detection probability of 10

In the upndashdown staircase method testing could start at any stepin the concentration series The first subject was started in the mid-dle of the range and each subsequent subject started where the for-mer subject had ended After one correct response the same con-centration was presented if two correct responses were given thenext lower concentration was presented on the next trial One in-correct response led to the presentation of a concentration one stephigher This decision rule is called the one-upndashtwo-down rule Al-though this decision rule is widely used the actual threshold esti-mate depends more on the size of the stimulus step than on the par-ticular upndashdown rule (Garcia-Perez 1998) Testing stopped afterfive reversals and the threshold was computed as the average of the

stimulus concentrations at the last four reversals This thresholdcorresponded to a detection probability of approximately 75

In the maximum-likelihood adaptive staircase method testingstarted in the middle of the stimulus range After each response theestimate of the threshold was calculated and the next stimulus wasa concentration close to that threshold Testing stopped when theconfidence interval of the estimated threshold reached a predefinedcriterion (08 log concentration units) This threshold corresponde dto a detection probability of exactly 75 All these computationswere carried out in real time on a Power Macintosh computer usinga program SensoryTester written by author LOH

Data analysis The mean threshold (and standard error) pro-duced by each method is presented in Table 1 labeled ldquoNativeThresholdrdquo As noted above each method produces a threshold es-timate that corresponds to a different performance level of detec-tion To obtain threshold and confidence interval estimates for theascending method of limits and the upndashdown staircase method thatcould be directly compared with those obtained by the ML-PESTmethod a logistic psychometric function was fitted to the actual se-quence of stimulus presentations and responses from these twomethods The values of b and g were held constant at the same valuesused in the ML-PEST method The maximum-likelihood thresholdand the confidence interval were then computed in exactly the samemanner as with the ML-PEST method This f itting procedure willgive a lower threshold for the ascending method of limits data thanthat reported by the method of limits procedure itself because theascending method of limits looks for a threshold stimulus giving100 correct whereas the maximum-likelihood threshold corre-sponds to 75 correct A repeated measures analysis of variance(ANOVA) was computed using StatView 501 (SAS Institute 1998a1998b) on each of four dependent variables native log threshold

Figure 5 Stimulus presentation as a function of trial number in the two-alternative forced-choiceMonte Carlo simulation Trials on which the subjectrsquos response was correct are represented by thefilled circles trials on which the subjectrsquos response was incorrect are represented by the open cir-cles The resulting maximum-likelihood estimate of a is plotted by the solid line The horizontaldotted line marks the true value of a The vertical bars show the 95 confidence interval of eachestimate of a

FAST AND ACCURATE THRESHOLDS 1337

number of trials to reach the native threshold maximum-likelihoo dlogistic threshold and confidence interval of the maximum-likelihoo dthreshold

ResultsThe native threshold values the mean numbers of tri-

als needed to reach the stopping criterion the maximum-likelihood logistic thresholds and the mean confidenceintervals of the maximum-likelihood thresholds are sum-marized in Table 1 A repeated measures ANOVA on thenative thresholds showed no significant main effect ofmethod [F(331) = 161 p lt 187] Subsequent statisticalcontrast comparisons using the standard HuynhndashFeldt eto correct the p value and the degrees of freedom re-vealed that the native threshold of the ascending methodof limits was just signif icantly higher than the othermethods [F(131) = 392 p = 051] These higher thresh-olds are to be expected given that the ascending methodof limits computes the threshold to be the concentrationcorresponding to a detection probability of 10 whereasthe other methods use a performance criterion of about 75

A repeated measures ANOVA on the number of trialsneeded to reach the stopping criterion showed a signifi-cant main effect of method [F(331) = 626 p lt 0008]Statistical contrast comparisons showed that the ascend-ing method required significantly fewer trials to deter-mine threshold [F(131) = 641 p lt 014] and that theupndashdown method required signif icantly more trials[F(131) = 606 p lt 017] than the average of the twomeasurements with the maximum-likelihood method

A repeated measures ANOVA on the maximum-like-lihood logistic thresholds showed no significant main ef-fect of method and the thresholds computed from thetrial-by-trial data from the ascending method of limitswere not significantly different from the thresholds com-puted from the data collected with the other methods[F(131) = 177 p = 19] This result is to be expectedsince presumably the performance of each observer isdriven by the same sensory process when tested by eachof the three psychometric methods

The mean confidence intervals for the thresholds (andstandard errors) are shown in the last column of Table 1The mean confidence interval of the maximum-likelihoodmethod is smaller than those of the other two Note thatthe standard error of the confidence interval reported forthe ML-PEST method is very small (002) This is be-cause the confidence interval was used as the stopping

criterion Trials were stopped when the confidence in-terval of a reached 08 A repeated measures ANOVAindicated that there was a signif icant main effect ofmethod [F(331) = 1727 p lt 0002] Contrasts tests re-vealed that the confidence intervals for the upndashdownmethod were significantly larger than those for both themaximum-likelihood method [F(131) = 5093 p lt0001] and the ascending method [F(131) = 2372 p lt0016]

DiscussionThe native absolute threshold values obtained with the

ascending method were higher than those obtained withthe other two methods One should remember that thesevalues are not conventional thresholds in the sense thatthey represent the concentration chosen correct five outof five times It is to be expected that this value will bea higher concentration than thresholds based on a lowerpercent correct Furthermore this method at least as de-scribed by Cain et al (1983) has considerably larger stepsbetween adjacent concentrations and only actual stimu-lus concentrations can be threshold values

The ascending method needed the fewest number oftrials (~13) relative to either the maximum-likelihoodmethod (~15) or the upndashdown staircase method (~17)The clear superiority of the maximum-likelihood methodhowever came with the high precision of the measuredthresholds as indicated by the small confidence inter-vals The confidence intervals of the upndashdown staircasewere almost three times larger than those given by themaximum-likelihood method

MONTE CARLO SIMULATIONS

When measuring the threshold of an actual subject itis not possible to know what the error is between themeasured value and the ldquotruerdquo value because the truevalue is not known In order to evaluate the accuracy ofthe three psychophysical methods we resorted to MonteCarlo evaluations of simulated observers having a pre-determined threshold value We used each of the threepsychophysical methods to measure the threshold ofeach simulated observer and compared the result withthe true value

Monte Carlo simulations were run for each of thethree psychophysical procedures using the same rulesand stimulus series as were used in the threshold mea-

Table 1Mean Performance (and Standard Error) of Three Psychophysical Methods

With Butyl Alcohol in Experiment 1

Native Number of ML Logistic Confidence Threshold Trials Threshold Interval

Method M SE M SE M SE M SE

ML-PESTLeft 341 016 1513 065 341 016 080 002Right 327 013 1541 076 327 013 084 002

WetherillndashLevitt 330 010 1744 072 330 011 232 034Ascending limits 310 013 1303 068 348 014 114 002

1338 LINSCHOTEN HARVEY ELLER AND JAFEK

surements for butyl alcohol with the real subjects re-ported above in Experiment 1 A population of 10000simulated subjects was defined whose thresholds fol-lowed a Gaussian distribution with a mean threshold of

300 log percent concentration and a standard devia-tion of 04 log units These values are representative ofthe distribution of real subjectsrsquo detection thresholds forbutyl alcohol as found in Experiment 2 below The pre-determined threshold for each of the 10000 simulatedobservers was drawn from the above distributionThreshold values higher than the highest stimulus con-centration and lower than the lowest stimulus concentra-tion were not used

Simulation 1On each trial the simulated subject was presented

with a stimulus Each subjectrsquos detection behavior wasdriven by a logistic psychometric function with a steep-ness of 35 and a chance level performance of 05 Theprobability of making the correct response on a trial wasdetermined by the probability predicted by the logisticfunction for the stimulus and for that subject and a uni-form random number between 0 and 1 If the randomnumber was less than the logistic function probability forthat stimulus the response was scored as correct other-wise it was scored as wrong The starting stimulus forthe upndashdown staircase and maximum-likelihood meth-ods was chosen randomly between the two stimuli thatbracketed the mean threshold of 30 log concentrationThese stimuli were 3113 and 2937 log concentra-tion The ascending method started with the lowest stim-ulus in its series 5130 log percent concentration

An ascending limits run was considered a failure if theprocedure called for a stimulus stronger than the higheststimulus concentration A WetherillndashLevitt staircase runwas considered a failure if during the trials a stimulusstronger than the highest stimulus concentration orweaker than the lowest stimulus concentration was calledfor In the present simulation the WetherillndashLevitt stair-

case failed 66 times All of these failures were caused bythe procedurersquos calling for a lower stimulus concentra-tion than the lowest available An ML-PEST run wouldhave been considered a failure if the final estimate of thethreshold were more than half the stopping confidenceinterval below the lowest candidate or above the highestcandidate a There were no such failures The failed runsare excluded from the data analyzed below

The distribution of errors for each method (estimatedthreshold minus true threshold) for the 10000 subjectsis shown in the left side of Figure 6 The distribution ofthe number of trials to reach stopping criterion for the10000 subjects is shown on the right side of Figure 6The maximum-likelihood method was more accuratethan the WetherillndashLevitt staircase and both of thesemethods were superior to the ascending method of lim-its The median error and the number of trials requiredfor each method are shown in Table 2 The median errorfor the maximum-likelihood method was almost half aslarge as that for the upndashdown staircase although both er-rors were quite small

The error on each of the 10000 subjects is plotted inFigure 7 as a function of the number of trials for that runThe range of errors became smaller as the number of tri-als increased for both the WetherillndashLevitt staircasemethod and the ML-PEST method and the median errorstayed close to zero The ascending staircase methodshowed quite different behavior The direction of theerror depended on the number of trials before the stop-ping criterion of five correct in a row had been reachedThe estimated thresholds were too low when the numberof trials was less than about 15 whereas the estimates ofthreshold were too high when the number of trials wasmore than 15

Simulation 2Because the WetherillndashLevitt staircase method and the

ML-PEST method generally required different numbers oftrials before the stopping criterion was reached on eachrun comparison of the accuracy is not straightforwardWe repeated the Monte Carlo simulations so that the stair-case method and the ML-PEST method used the samenumber of trials on each run First the WetherillndashLevittstaircase was run until the stopping criterion wasreached Then the ML-PEST method was run with thesame number of trials on that observer using the samestarting stimulus

The results are presented in Table 2 When the ML-PEST method had the same number of trials as theupndashdown staircase the ML-PEST method was slightlymore accurate As in Simulation 1 both of these meth-ods were superior in accuracy to the ascending staircaseThere were 60 failures out of the 10000 cases where theupndashdown staircase called for a stimulus that was lowerthan the lowest available concentration Because theML-PEST method was not permitted to run as many tri-als as it would normally require this method failed on 30out of the 10000 cases

Table 2Results of the Four Monte Carlo Simulations

Method Median Error Median Number of Trials

Simulation 1Maximum-likelihood 0027 16WetherillndashLevitt 0045 17Ascending limits 0331 14

Simulation 2Maximum-likelihood 0094 17WetherillndashLevitt 0101 17Ascending limits 0419 14

Simulation 3Maximum-likelihood 003 16WetherillndashLevitt 360 16Ascending limits 354 16

Simulation 4Maximum-likelihood 275 115WetherillndashLevitt 093 21Ascending limits 336 22

FAST AND ACCURATE THRESHOLDS 1339

Simulation 3An important ability of sensory testing is to detect

people who deliberately try to appear to have diminishedsensitivity We therefore repeated the Monte Carlo sim-ulations of 10000 subjects using the same starting con-ditions and the same test conditions as those in Simula-tion 1 but each simulated subject generated ldquountruthfulrdquoresponses (ie responses just the opposite of that calledfor by the subjectrsquos psychometric function) In the ML-

PEST computer program two likelihood functions aremaintained one based on the hypothesis that the subjectis responding truthfully and the other based on the hy-pothesis that the subject is deliberately choosing thewrong response The program explicitly tests the hy-pothesis that a subject is deliberately choosing the wrongresponse by comparing the maxima of the two likelihooddistributions and therefore should be able to measure a sensory threshold as effectively as when a subject is

Figure 6 Distribution of errors (left) and number of trials (right) for three psychophysical methods maximum-likelihoodadaptive staircase (upper panel) Wetherill and Levitt upndashdown staircase (middle panel) and ascending method of limits (lowerpanel) The vertical line in the left panels marks the zero-error bin of each histogram See text for details The abscissa scalefor the upper two left panels covers a range of 15 log units and is 5 log units for the lower left histogram

1340 LINSCHOTEN HARVEY ELLER AND JAFEK

Figure 7 Error as a function of number of trials for three psychophysicalmethods ML-PEST (upper panel) upndashdown staircase (middle panel) and as-cending method of limits (lower panel)

FAST AND ACCURATE THRESHOLDS 1341

telling the truth The other two methods do not have anexplicit way of dealing with this situation It is of inter-est to learn how all three methods behave under thesecircumstances

The results of this simulation are presented in Table 2The ML-PEST method successfully measured the thresh-old and explicitly detected the lying condition in all10000 cases and was therefore able to measure thethreshold of the underlying psychometric function witha small median error The upndashdown staircase method andthe ascending method of limits reached the highest con-centration and tried to go higher resulting in 9824 fail-ures for the staircase method and 8648 failures for themethod of limits If we consider a failure as a successfuldetection of the lying behavior the upndashdown staircasemethod detected 98 of the cases and the ascendingmethod of limits detected 86 of the cases These twomethods required about the same number of trials (seeTable 2)

Simulation 4It is important in clinical testing that people with ele-

vated thresholds be easily detected One of the subjects inExperiment 1 f irst had smell tested by the upndashdownmethod and exhibited a normal threshold Later the samenostril was tested using the maximum-likelihood methodand no threshold could be measured because as it turnedout he was anosmic on that side The fact that the upndashdown method did not detect the anosmia is of concern

We therefore repeated the Monte Carlo simulations of10000 subjects using the same starting conditions andthe same test conditions as those in Simulation 1 withone change Each simulated subject generated a randomresponse as would be the case if a person had no sensorysensitivity The ideal behavior of a testing method wouldbe to indicate that the threshold was at the highest valuethat is possible with that method or to have some otherexplicit way of indicating that the subject is respondingrandomly

None of the three methods performed very well in de-tecting the random responding The median errors andnumber of trials are given in Table 2 The upndashdown stair-case method did the poorest because it gives a medianerror of 93 This rather low error occurs because on87 of the cases the stopping criterion of 5 reversalswas satisfied before the method reached the upper limitof the stimulus concentrations The ML-PEST methodfailed on 11 of the cases but overall the thresholdswere estimated to be very high Unfortunately it re-quired a great number of trials to achieve this perfor-mance The ascending method of limits reached theupper limit of stimulus concentration on 66 of thecases and also gave the highest values for the threshold

Since in Monte Carlo simulations the distribution ofthe actual thresholds is known we can apply an SDT analy-sis (Green amp Swets 19661974 Macmillan amp Creelman1991) to compare the distribution of estimated thresh-olds made by each method under random respondingwith the true distribution Using the means and standard

deviations of the above distributions we computed thesignal detection measures of sensitivity da and accuracythe area under the ROC curve Az (Harvey 1992 Simp-son amp Fitter 1973) The ML-PEST method had the high-est sensitivity and accuracy (da = 227 Az = 95) The as-cending method was next best (da = 199 Az = 92) andthe upndashdown staircase method was the least sensitive(da = 150 Az = 86)

DiscussionThe results of the simulations lead to the conclusion that

both the upndashdown staircase method and the maximum-likelihood method have considerable strengths for use inmeasuring taste and smell thresholds The ascendingmethod of limits should not be used if one is interestedin obtaining an accurate estimate of the threshold Thesethreshold estimates covary with the number of trialstaken to stop as is illustrated in the lower panel of Fig-ure 7 The simulation results offer an explanation for thepublished finding that ascending limit measurement ofolfactory thresholds are less reliable than staircase mea-sured thresholds (Doty McKeown Lee amp Shaman1995)

The upndashdown staircase method gives an average errortwice as large as that of the ML-PEST method althoughboth values are small and therefore similar to eachother The upndashdown staircase method failed by callingfor a stimulus too low on 66 out of 10000 cases whereasthe ML-PEST method had no such failures Althoughthe median number of trials gives the edge to the ML-PEST method on some occasions the ML-PEST methodrequired over 50 trials to reach its stopping criterion

A strength of the ML-PEST method is its ability toconsider alternate hypotheses about the response strat-egy of the observer The ML-PEST method was able tocorrectly detect all 10000 dishonest responders We arecurrently preparing a paper that will go into considerabledetail on how to distinguish malingerers from true anos-mics using a variety of methods Another strength is thatthere is virtually no bias introduced into threshold mea-surements by having the ML-PEST steepness differentfrom the ldquotruerdquo value set in the Monte Carlo observer(Treutwein amp Strasburger 1999) Data generated from aMonte Carlo Weibull function observer can be well fit byvirtually any s-shaped psychometric function althoughthe Weibull will usually fit slightly better We thereforerecommend the ML-PEST method for testing taste andsmell

EXPERIMENT 2TestndashRetest Reliability

In order to assess the stability of the measures ob-tained with the adaptive maximum-likelihood procedurewe tested the subjects on four occasions

MethodSubjects Twenty-seven nonsmoking healthy subjects partici-

pated in the experiment (mean age = 316 plusmn 87 years range =

1342 LINSCHOTEN HARVEY ELLER AND JAFEK

22ndash57 years) The 14 women and 13 men were divided into twogroups Group 1 (ldquoconsecutive rdquo 7 females and 7 males) was testedon 4 consecutive days and Group 2 (ldquodistributed rdquo 7 females and 6males) was tested on Days 1 2 8 and 22

Stimuli Smell thresholds for butyl alcohol were assessed usingthe same range of concentrations as were used with the maximum-likelihood method in Experiment 1 Taste thresholds were measuredusing 20 concentrations of NaCl in quarter-log steps The highestconcentration was 17 M The same maximum-likelihood adaptivemethod as described in Experiment 1 was used in this experiment

Procedure Thresholds were determined with a temporal 2AFCprocedure Taste sensitivity was measured with a whole-mouth sip-and-spit procedure The stimuli and the blanks were presented in30-ml plastic medicine cups containing 5 ml of liquid The sub-jects were requested to take the f irst sample in their mouths swishit around and spit it out then rinse with distilled water and do thesame with the second sample They were asked to choose the sam-ple that had a taste different from water In each of the four sessionstaste thresholds were determined first and smell thresholds last Allfour smell thresholds were determined in the preferred nostril Thesubjects were tested at the same time of each day

ResultsBecause the measurement and specification of testndash

retest reliability is a rather complex topic (Linn 1989)

we have chosen four different ways to demonstrate thereliability of the maximum-likelihood method For thefirst method the four threshold measurements for eachsubject are plotted in Figure 8 The thick line in eachpanel is the mean threshold It is clear from examinationof Figure 8 that most individual thresholds were stableover 4 days (left column of panels) and over 22 days(right column of panels) with a few subjects showingshifts of more than a 1 log unit The mean thresholds forthe two groups are shown in Table 3 A repeated mea-sures ANOVA confirmed the stability of the thresholdsThe effect of testing day was not significant

A second approach is to compute the correlationsamong the four threshold measurements (see eg Mattes1988) Pearson correlation coefficients between thresh-olds obtained on different days are shown in Table 4 Thecorrelations were generally higher for pairs of thresholdsthat were measured close together in time As Mattespointed out correlations are highly influenced by smallchanges in ordinal ranking of the subjects The fact thatthe pairwise correlations became smaller as time sepa-ration between them increased is shown in Figure 9 The

Figure 8 Log threshold concentration as a function of test day for NaCl (upper row) and butanol (lower row) measured on 4 suc-cessive days (left column) and over 22 days (right column) The thin lines connect the four thresholds of individual subjects The thickline is the mean threshold concentration of all subjects

FAST AND ACCURATE THRESHOLDS 1343

solid line is the least squares regression line It indicatesthat the correlation between two thresholds diminishedas the time between them increased The slope of the re-gression line becomes even steeper if the data for 1-dayseparation are removed from the regression indicatingthat the decrease in correlation started with the 2-dayseparation condition

The third method to assess testndashretest reliability isbased on the differences between successive thresholdmeasurements Each threshold was subtracted from theprevious threshold The four threshold measurements re-sulted in three difference measurements The distribu-tions of these differences for NaCl and for butyl alcoholare plotted on the left side of Figure 10 If the measuredthresholds do not systematically increase or decreaseover time we expect that the mean difference would bezero The mean of the distribution for NaCl (upper panelof Figure 10) was 0018 and for butyl alcohol (lowerpanel) was 0002 Neither of these was significantly dif-ferent from zero The standard deviations of these distri-butions were 0413 log units for NaCl and 0422 log unitsfor butyl alcohol

A fourth way to examine testndashretest reliability is bymeans of the spread of the four threshold measures foreach subject A standard deviation of 00 would meanthat all four thresholds were exactly the same value Thestandard deviation of the four measurements was com-puted for each subject The distribution of these standarddeviations for NaCl (upper panel) and butyl alcohol(lower panel) are plotted on the right side of Figure 10The median standard deviation was about 015 log unitsfor both NaCl and butyl alcohol This good testndashretestreliability is consistent with the other three methods ofassessing reliability reported above

DiscussionThe maximum-likelihood method gives estimates of

thresholds that are stable within individuals over timeand that are responsive to individual differences By allfour indicators the testndashretest reliability of both NaCland butyl alcohol thresholds measured by the maximum-likelihood method was very good

GENERAL DISCUSSION

The two experiments and the Monte Carlo simulationsreported here demonstrate that the maximum-likelihoodmethod is a viable alternative to other methods currentlyin use to measure taste and smell detection thresholdsBecause the method attempts to use stimuli that are op-timal (close to the predicted threshold) the confidenceintervals of the measured thresholds are considerablysmaller than those achieved by the other two methods(Experiment 1)

One of the strengths of the maximum-likelihoodmethod is that it provides an explicit way of estimatingthe confidence interval of the threshold a facility notprovided by the other two methods The experimenter isfree to establish a stopping confidence interval to suithisher needs If greater precision is desired it can beachieved at the expense of running more trials This ex-plicit tradeoff between precision and number of trials isnot possible with the other two psychophysical methodsOne could of course use a hybrid combination of meth-ods an upndashdown staircase method for generating the nextstimulus to use combined with a maximum-likelihoodmethod for estimating the threshold and its confidenceinterval In effect we used this approach after the datafrom Experiment 1 were collected by reanalyzing them

Table 3Mean NaCl and Butyl Alcohol Thresholds (and Standard Errors) From Experiment 2

Day 1 Day 2 Day 3 Day 4

M SE M SE M SE M SE

NaClConsecutive (Group 1) 262 013 267 011 282 013 263 011Distributed (Group 2) 290 019 283 009 283 011 277 010

Butyl AlcoholConsecutive (Group 1) 319 014 317 015 325 012 310 013Distributed (Group 2) 300 016 310 014 309 010 309 011

Table 4Pearson Correlation Coefficients Between Thresholds

Paired Sessions

1ndash2 1ndash3 1ndash4 2ndash3 2ndash4 3 ndash 4

NaClConsecutive (Group 1) 46 66 66 44 75 66Distributed (Group 2) 58 72 40 55 20 51Butyl alcoholConsecutive (Group 1) 86 83 86 67 71 76Distributed (Group 2) 26 68 56 51 21 65

p lt 05 p lt 01 p lt 001

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 6: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

FAST AND ACCURATE THRESHOLDS 1335

increases The confidence interval of the estimatedthreshold is directly related to the width of the likelihooddistribution and is calculated using the likelihood ratiotest and the c2 statistical distribution with 1 degree offreedom (Hays 1963)

(2)

where Lx is the likelihood of a model whose a parameteris x and Lmax is the likelihood of the best-fitting modelWe search for the stimulus concentrations (x) above andbelow the threshold stimulus (the maximum-likelihooda) at which the log likelihood drops to c22 below themaximum log likelihood value To find the 95 confi-dence interval the criterion value of c2 is 50239 (thisvalue is obtained from standard tables of the c2 distribu-tion found in all statistic text books The one-tailed valueof 50239 corresponds to a probability of 025 since theconfidence interval is two-tailed 50239 corresponds toa probability of 05)

In Figure 5 the same data shown in Figure 4 are plot-ted with an expanded vertical scale to show the estimateof the threshold along with the estimated 95 confi-dence interval Notice that as the number of trials in-creases the conf idence interval of the threshold de-creases After 25 trials the 95 confidence interval ofthe threshold extends 027 log units below the estimateof the threshold (indicated by the solid horizontal linenot by the filled circle which represents the stimulus

presented on that trial) and 037 log units above it for atotal confidence interval of 064 log units

EXPERIMENT 1Comparison of Three Psychophysical Methods

The goal of Experiment 1 was to compare the ML-PEST method with two commonly used methods withrespect to their efficiency and their precision Detectionthresholds for butyl alcohol were measured using threemethods the ascending method of limits as described by Cain et al (1983) the upndashdown staircase method(Wetherill amp Levitt 1965) and the ML-PEST maximum-likelihood staircase method as described above (Harvey1986 1997)

MethodSubjects Thirty-two (16 males and 16 females) healthy non-

smoking subjects participated in the experiment Their mean age was304 years (plusmn 81) ranging from 17 to 60 years None reported hav-ing conditions that are normally associated with smell dysfunction

Stimuli for ascending method of limits For the ascendingmethod of limits the procedure described by Cain et al (1983) wasused Thirteen concentrations of butyl alcohol were prepared withdistilled water as diluent The strongest concentration was 4 Ad-jacent steps differed by a factor of 3

Stimuli for staircase methods The Wetherill and Levitt andML-PEST methods used a series of 24 concentrations of butyl al-cohol The lowest 20 concentrations were a factor of 15 apart withthe strongest 01 This series was augmented by the four strongestof the series for the ascending method which were separated by afactor of 3

Procedure Thresholds were determined with a temporal 2AFCprocedure In each trial a stimulus was paired with a blank and the

c 2 2 0 2 0= times = timesaeligegraveccedil

oumloslashdivide

log ( ) log max

e exL

Llikelihood ratio

Figure 4 Stimulus presentation as a function of trial number in the two-alternative forced-choice Monte Carlo simulation Trials on which the subjectrsquos response was correct are repre-sented by the filled circles trials on which the subjectrsquos response was incorrect are representedby the open circles The resulting maximum-likelihood estimate of a is plotted by the solid lineThe horizontal dotted line marks the true value of a

1336 LINSCHOTEN HARVEY ELLER AND JAFEK

subjects were asked to select the bottle containing the stimulus Thesubjects were asked to give their best guess even when they werecertain that the two stimuli were identical The stimulus and theblank were presented in 250-ml squeezable polyethylene bottlescontaining 60 ml of liquid The subjects were instructed to keep onenostril closed while the other nostril was tested with three puffs outof each bottle For each subject four thresholds were obtained twofor each nostril In randomized order and alternating nostrils onethreshold was determined with the ascending method one with theupndashdown method and two with the adaptive maximum-likelihoo dmethod Testing took about 45 min

The definition of detection threshold differs in several aspectsamong the three methods In the ascending method testing startedwith the lowest concentration A correct response led to the pre-sentation of the same concentration on the next trial When an in-correct response was given the next higher concentration was usedTesting stopped when five correct answers had been given in a rowThe threshold was defined as the last-used concentration (Cain et al1983) and thus corresponded to a detection probability of 10

In the upndashdown staircase method testing could start at any stepin the concentration series The first subject was started in the mid-dle of the range and each subsequent subject started where the for-mer subject had ended After one correct response the same con-centration was presented if two correct responses were given thenext lower concentration was presented on the next trial One in-correct response led to the presentation of a concentration one stephigher This decision rule is called the one-upndashtwo-down rule Al-though this decision rule is widely used the actual threshold esti-mate depends more on the size of the stimulus step than on the par-ticular upndashdown rule (Garcia-Perez 1998) Testing stopped afterfive reversals and the threshold was computed as the average of the

stimulus concentrations at the last four reversals This thresholdcorresponded to a detection probability of approximately 75

In the maximum-likelihood adaptive staircase method testingstarted in the middle of the stimulus range After each response theestimate of the threshold was calculated and the next stimulus wasa concentration close to that threshold Testing stopped when theconfidence interval of the estimated threshold reached a predefinedcriterion (08 log concentration units) This threshold corresponde dto a detection probability of exactly 75 All these computationswere carried out in real time on a Power Macintosh computer usinga program SensoryTester written by author LOH

Data analysis The mean threshold (and standard error) pro-duced by each method is presented in Table 1 labeled ldquoNativeThresholdrdquo As noted above each method produces a threshold es-timate that corresponds to a different performance level of detec-tion To obtain threshold and confidence interval estimates for theascending method of limits and the upndashdown staircase method thatcould be directly compared with those obtained by the ML-PESTmethod a logistic psychometric function was fitted to the actual se-quence of stimulus presentations and responses from these twomethods The values of b and g were held constant at the same valuesused in the ML-PEST method The maximum-likelihood thresholdand the confidence interval were then computed in exactly the samemanner as with the ML-PEST method This f itting procedure willgive a lower threshold for the ascending method of limits data thanthat reported by the method of limits procedure itself because theascending method of limits looks for a threshold stimulus giving100 correct whereas the maximum-likelihood threshold corre-sponds to 75 correct A repeated measures analysis of variance(ANOVA) was computed using StatView 501 (SAS Institute 1998a1998b) on each of four dependent variables native log threshold

Figure 5 Stimulus presentation as a function of trial number in the two-alternative forced-choiceMonte Carlo simulation Trials on which the subjectrsquos response was correct are represented by thefilled circles trials on which the subjectrsquos response was incorrect are represented by the open cir-cles The resulting maximum-likelihood estimate of a is plotted by the solid line The horizontaldotted line marks the true value of a The vertical bars show the 95 confidence interval of eachestimate of a

FAST AND ACCURATE THRESHOLDS 1337

number of trials to reach the native threshold maximum-likelihoo dlogistic threshold and confidence interval of the maximum-likelihoo dthreshold

ResultsThe native threshold values the mean numbers of tri-

als needed to reach the stopping criterion the maximum-likelihood logistic thresholds and the mean confidenceintervals of the maximum-likelihood thresholds are sum-marized in Table 1 A repeated measures ANOVA on thenative thresholds showed no significant main effect ofmethod [F(331) = 161 p lt 187] Subsequent statisticalcontrast comparisons using the standard HuynhndashFeldt eto correct the p value and the degrees of freedom re-vealed that the native threshold of the ascending methodof limits was just signif icantly higher than the othermethods [F(131) = 392 p = 051] These higher thresh-olds are to be expected given that the ascending methodof limits computes the threshold to be the concentrationcorresponding to a detection probability of 10 whereasthe other methods use a performance criterion of about 75

A repeated measures ANOVA on the number of trialsneeded to reach the stopping criterion showed a signifi-cant main effect of method [F(331) = 626 p lt 0008]Statistical contrast comparisons showed that the ascend-ing method required significantly fewer trials to deter-mine threshold [F(131) = 641 p lt 014] and that theupndashdown method required signif icantly more trials[F(131) = 606 p lt 017] than the average of the twomeasurements with the maximum-likelihood method

A repeated measures ANOVA on the maximum-like-lihood logistic thresholds showed no significant main ef-fect of method and the thresholds computed from thetrial-by-trial data from the ascending method of limitswere not significantly different from the thresholds com-puted from the data collected with the other methods[F(131) = 177 p = 19] This result is to be expectedsince presumably the performance of each observer isdriven by the same sensory process when tested by eachof the three psychometric methods

The mean confidence intervals for the thresholds (andstandard errors) are shown in the last column of Table 1The mean confidence interval of the maximum-likelihoodmethod is smaller than those of the other two Note thatthe standard error of the confidence interval reported forthe ML-PEST method is very small (002) This is be-cause the confidence interval was used as the stopping

criterion Trials were stopped when the confidence in-terval of a reached 08 A repeated measures ANOVAindicated that there was a signif icant main effect ofmethod [F(331) = 1727 p lt 0002] Contrasts tests re-vealed that the confidence intervals for the upndashdownmethod were significantly larger than those for both themaximum-likelihood method [F(131) = 5093 p lt0001] and the ascending method [F(131) = 2372 p lt0016]

DiscussionThe native absolute threshold values obtained with the

ascending method were higher than those obtained withthe other two methods One should remember that thesevalues are not conventional thresholds in the sense thatthey represent the concentration chosen correct five outof five times It is to be expected that this value will bea higher concentration than thresholds based on a lowerpercent correct Furthermore this method at least as de-scribed by Cain et al (1983) has considerably larger stepsbetween adjacent concentrations and only actual stimu-lus concentrations can be threshold values

The ascending method needed the fewest number oftrials (~13) relative to either the maximum-likelihoodmethod (~15) or the upndashdown staircase method (~17)The clear superiority of the maximum-likelihood methodhowever came with the high precision of the measuredthresholds as indicated by the small confidence inter-vals The confidence intervals of the upndashdown staircasewere almost three times larger than those given by themaximum-likelihood method

MONTE CARLO SIMULATIONS

When measuring the threshold of an actual subject itis not possible to know what the error is between themeasured value and the ldquotruerdquo value because the truevalue is not known In order to evaluate the accuracy ofthe three psychophysical methods we resorted to MonteCarlo evaluations of simulated observers having a pre-determined threshold value We used each of the threepsychophysical methods to measure the threshold ofeach simulated observer and compared the result withthe true value

Monte Carlo simulations were run for each of thethree psychophysical procedures using the same rulesand stimulus series as were used in the threshold mea-

Table 1Mean Performance (and Standard Error) of Three Psychophysical Methods

With Butyl Alcohol in Experiment 1

Native Number of ML Logistic Confidence Threshold Trials Threshold Interval

Method M SE M SE M SE M SE

ML-PESTLeft 341 016 1513 065 341 016 080 002Right 327 013 1541 076 327 013 084 002

WetherillndashLevitt 330 010 1744 072 330 011 232 034Ascending limits 310 013 1303 068 348 014 114 002

1338 LINSCHOTEN HARVEY ELLER AND JAFEK

surements for butyl alcohol with the real subjects re-ported above in Experiment 1 A population of 10000simulated subjects was defined whose thresholds fol-lowed a Gaussian distribution with a mean threshold of

300 log percent concentration and a standard devia-tion of 04 log units These values are representative ofthe distribution of real subjectsrsquo detection thresholds forbutyl alcohol as found in Experiment 2 below The pre-determined threshold for each of the 10000 simulatedobservers was drawn from the above distributionThreshold values higher than the highest stimulus con-centration and lower than the lowest stimulus concentra-tion were not used

Simulation 1On each trial the simulated subject was presented

with a stimulus Each subjectrsquos detection behavior wasdriven by a logistic psychometric function with a steep-ness of 35 and a chance level performance of 05 Theprobability of making the correct response on a trial wasdetermined by the probability predicted by the logisticfunction for the stimulus and for that subject and a uni-form random number between 0 and 1 If the randomnumber was less than the logistic function probability forthat stimulus the response was scored as correct other-wise it was scored as wrong The starting stimulus forthe upndashdown staircase and maximum-likelihood meth-ods was chosen randomly between the two stimuli thatbracketed the mean threshold of 30 log concentrationThese stimuli were 3113 and 2937 log concentra-tion The ascending method started with the lowest stim-ulus in its series 5130 log percent concentration

An ascending limits run was considered a failure if theprocedure called for a stimulus stronger than the higheststimulus concentration A WetherillndashLevitt staircase runwas considered a failure if during the trials a stimulusstronger than the highest stimulus concentration orweaker than the lowest stimulus concentration was calledfor In the present simulation the WetherillndashLevitt stair-

case failed 66 times All of these failures were caused bythe procedurersquos calling for a lower stimulus concentra-tion than the lowest available An ML-PEST run wouldhave been considered a failure if the final estimate of thethreshold were more than half the stopping confidenceinterval below the lowest candidate or above the highestcandidate a There were no such failures The failed runsare excluded from the data analyzed below

The distribution of errors for each method (estimatedthreshold minus true threshold) for the 10000 subjectsis shown in the left side of Figure 6 The distribution ofthe number of trials to reach stopping criterion for the10000 subjects is shown on the right side of Figure 6The maximum-likelihood method was more accuratethan the WetherillndashLevitt staircase and both of thesemethods were superior to the ascending method of lim-its The median error and the number of trials requiredfor each method are shown in Table 2 The median errorfor the maximum-likelihood method was almost half aslarge as that for the upndashdown staircase although both er-rors were quite small

The error on each of the 10000 subjects is plotted inFigure 7 as a function of the number of trials for that runThe range of errors became smaller as the number of tri-als increased for both the WetherillndashLevitt staircasemethod and the ML-PEST method and the median errorstayed close to zero The ascending staircase methodshowed quite different behavior The direction of theerror depended on the number of trials before the stop-ping criterion of five correct in a row had been reachedThe estimated thresholds were too low when the numberof trials was less than about 15 whereas the estimates ofthreshold were too high when the number of trials wasmore than 15

Simulation 2Because the WetherillndashLevitt staircase method and the

ML-PEST method generally required different numbers oftrials before the stopping criterion was reached on eachrun comparison of the accuracy is not straightforwardWe repeated the Monte Carlo simulations so that the stair-case method and the ML-PEST method used the samenumber of trials on each run First the WetherillndashLevittstaircase was run until the stopping criterion wasreached Then the ML-PEST method was run with thesame number of trials on that observer using the samestarting stimulus

The results are presented in Table 2 When the ML-PEST method had the same number of trials as theupndashdown staircase the ML-PEST method was slightlymore accurate As in Simulation 1 both of these meth-ods were superior in accuracy to the ascending staircaseThere were 60 failures out of the 10000 cases where theupndashdown staircase called for a stimulus that was lowerthan the lowest available concentration Because theML-PEST method was not permitted to run as many tri-als as it would normally require this method failed on 30out of the 10000 cases

Table 2Results of the Four Monte Carlo Simulations

Method Median Error Median Number of Trials

Simulation 1Maximum-likelihood 0027 16WetherillndashLevitt 0045 17Ascending limits 0331 14

Simulation 2Maximum-likelihood 0094 17WetherillndashLevitt 0101 17Ascending limits 0419 14

Simulation 3Maximum-likelihood 003 16WetherillndashLevitt 360 16Ascending limits 354 16

Simulation 4Maximum-likelihood 275 115WetherillndashLevitt 093 21Ascending limits 336 22

FAST AND ACCURATE THRESHOLDS 1339

Simulation 3An important ability of sensory testing is to detect

people who deliberately try to appear to have diminishedsensitivity We therefore repeated the Monte Carlo sim-ulations of 10000 subjects using the same starting con-ditions and the same test conditions as those in Simula-tion 1 but each simulated subject generated ldquountruthfulrdquoresponses (ie responses just the opposite of that calledfor by the subjectrsquos psychometric function) In the ML-

PEST computer program two likelihood functions aremaintained one based on the hypothesis that the subjectis responding truthfully and the other based on the hy-pothesis that the subject is deliberately choosing thewrong response The program explicitly tests the hy-pothesis that a subject is deliberately choosing the wrongresponse by comparing the maxima of the two likelihooddistributions and therefore should be able to measure a sensory threshold as effectively as when a subject is

Figure 6 Distribution of errors (left) and number of trials (right) for three psychophysical methods maximum-likelihoodadaptive staircase (upper panel) Wetherill and Levitt upndashdown staircase (middle panel) and ascending method of limits (lowerpanel) The vertical line in the left panels marks the zero-error bin of each histogram See text for details The abscissa scalefor the upper two left panels covers a range of 15 log units and is 5 log units for the lower left histogram

1340 LINSCHOTEN HARVEY ELLER AND JAFEK

Figure 7 Error as a function of number of trials for three psychophysicalmethods ML-PEST (upper panel) upndashdown staircase (middle panel) and as-cending method of limits (lower panel)

FAST AND ACCURATE THRESHOLDS 1341

telling the truth The other two methods do not have anexplicit way of dealing with this situation It is of inter-est to learn how all three methods behave under thesecircumstances

The results of this simulation are presented in Table 2The ML-PEST method successfully measured the thresh-old and explicitly detected the lying condition in all10000 cases and was therefore able to measure thethreshold of the underlying psychometric function witha small median error The upndashdown staircase method andthe ascending method of limits reached the highest con-centration and tried to go higher resulting in 9824 fail-ures for the staircase method and 8648 failures for themethod of limits If we consider a failure as a successfuldetection of the lying behavior the upndashdown staircasemethod detected 98 of the cases and the ascendingmethod of limits detected 86 of the cases These twomethods required about the same number of trials (seeTable 2)

Simulation 4It is important in clinical testing that people with ele-

vated thresholds be easily detected One of the subjects inExperiment 1 f irst had smell tested by the upndashdownmethod and exhibited a normal threshold Later the samenostril was tested using the maximum-likelihood methodand no threshold could be measured because as it turnedout he was anosmic on that side The fact that the upndashdown method did not detect the anosmia is of concern

We therefore repeated the Monte Carlo simulations of10000 subjects using the same starting conditions andthe same test conditions as those in Simulation 1 withone change Each simulated subject generated a randomresponse as would be the case if a person had no sensorysensitivity The ideal behavior of a testing method wouldbe to indicate that the threshold was at the highest valuethat is possible with that method or to have some otherexplicit way of indicating that the subject is respondingrandomly

None of the three methods performed very well in de-tecting the random responding The median errors andnumber of trials are given in Table 2 The upndashdown stair-case method did the poorest because it gives a medianerror of 93 This rather low error occurs because on87 of the cases the stopping criterion of 5 reversalswas satisfied before the method reached the upper limitof the stimulus concentrations The ML-PEST methodfailed on 11 of the cases but overall the thresholdswere estimated to be very high Unfortunately it re-quired a great number of trials to achieve this perfor-mance The ascending method of limits reached theupper limit of stimulus concentration on 66 of thecases and also gave the highest values for the threshold

Since in Monte Carlo simulations the distribution ofthe actual thresholds is known we can apply an SDT analy-sis (Green amp Swets 19661974 Macmillan amp Creelman1991) to compare the distribution of estimated thresh-olds made by each method under random respondingwith the true distribution Using the means and standard

deviations of the above distributions we computed thesignal detection measures of sensitivity da and accuracythe area under the ROC curve Az (Harvey 1992 Simp-son amp Fitter 1973) The ML-PEST method had the high-est sensitivity and accuracy (da = 227 Az = 95) The as-cending method was next best (da = 199 Az = 92) andthe upndashdown staircase method was the least sensitive(da = 150 Az = 86)

DiscussionThe results of the simulations lead to the conclusion that

both the upndashdown staircase method and the maximum-likelihood method have considerable strengths for use inmeasuring taste and smell thresholds The ascendingmethod of limits should not be used if one is interestedin obtaining an accurate estimate of the threshold Thesethreshold estimates covary with the number of trialstaken to stop as is illustrated in the lower panel of Fig-ure 7 The simulation results offer an explanation for thepublished finding that ascending limit measurement ofolfactory thresholds are less reliable than staircase mea-sured thresholds (Doty McKeown Lee amp Shaman1995)

The upndashdown staircase method gives an average errortwice as large as that of the ML-PEST method althoughboth values are small and therefore similar to eachother The upndashdown staircase method failed by callingfor a stimulus too low on 66 out of 10000 cases whereasthe ML-PEST method had no such failures Althoughthe median number of trials gives the edge to the ML-PEST method on some occasions the ML-PEST methodrequired over 50 trials to reach its stopping criterion

A strength of the ML-PEST method is its ability toconsider alternate hypotheses about the response strat-egy of the observer The ML-PEST method was able tocorrectly detect all 10000 dishonest responders We arecurrently preparing a paper that will go into considerabledetail on how to distinguish malingerers from true anos-mics using a variety of methods Another strength is thatthere is virtually no bias introduced into threshold mea-surements by having the ML-PEST steepness differentfrom the ldquotruerdquo value set in the Monte Carlo observer(Treutwein amp Strasburger 1999) Data generated from aMonte Carlo Weibull function observer can be well fit byvirtually any s-shaped psychometric function althoughthe Weibull will usually fit slightly better We thereforerecommend the ML-PEST method for testing taste andsmell

EXPERIMENT 2TestndashRetest Reliability

In order to assess the stability of the measures ob-tained with the adaptive maximum-likelihood procedurewe tested the subjects on four occasions

MethodSubjects Twenty-seven nonsmoking healthy subjects partici-

pated in the experiment (mean age = 316 plusmn 87 years range =

1342 LINSCHOTEN HARVEY ELLER AND JAFEK

22ndash57 years) The 14 women and 13 men were divided into twogroups Group 1 (ldquoconsecutive rdquo 7 females and 7 males) was testedon 4 consecutive days and Group 2 (ldquodistributed rdquo 7 females and 6males) was tested on Days 1 2 8 and 22

Stimuli Smell thresholds for butyl alcohol were assessed usingthe same range of concentrations as were used with the maximum-likelihood method in Experiment 1 Taste thresholds were measuredusing 20 concentrations of NaCl in quarter-log steps The highestconcentration was 17 M The same maximum-likelihood adaptivemethod as described in Experiment 1 was used in this experiment

Procedure Thresholds were determined with a temporal 2AFCprocedure Taste sensitivity was measured with a whole-mouth sip-and-spit procedure The stimuli and the blanks were presented in30-ml plastic medicine cups containing 5 ml of liquid The sub-jects were requested to take the f irst sample in their mouths swishit around and spit it out then rinse with distilled water and do thesame with the second sample They were asked to choose the sam-ple that had a taste different from water In each of the four sessionstaste thresholds were determined first and smell thresholds last Allfour smell thresholds were determined in the preferred nostril Thesubjects were tested at the same time of each day

ResultsBecause the measurement and specification of testndash

retest reliability is a rather complex topic (Linn 1989)

we have chosen four different ways to demonstrate thereliability of the maximum-likelihood method For thefirst method the four threshold measurements for eachsubject are plotted in Figure 8 The thick line in eachpanel is the mean threshold It is clear from examinationof Figure 8 that most individual thresholds were stableover 4 days (left column of panels) and over 22 days(right column of panels) with a few subjects showingshifts of more than a 1 log unit The mean thresholds forthe two groups are shown in Table 3 A repeated mea-sures ANOVA confirmed the stability of the thresholdsThe effect of testing day was not significant

A second approach is to compute the correlationsamong the four threshold measurements (see eg Mattes1988) Pearson correlation coefficients between thresh-olds obtained on different days are shown in Table 4 Thecorrelations were generally higher for pairs of thresholdsthat were measured close together in time As Mattespointed out correlations are highly influenced by smallchanges in ordinal ranking of the subjects The fact thatthe pairwise correlations became smaller as time sepa-ration between them increased is shown in Figure 9 The

Figure 8 Log threshold concentration as a function of test day for NaCl (upper row) and butanol (lower row) measured on 4 suc-cessive days (left column) and over 22 days (right column) The thin lines connect the four thresholds of individual subjects The thickline is the mean threshold concentration of all subjects

FAST AND ACCURATE THRESHOLDS 1343

solid line is the least squares regression line It indicatesthat the correlation between two thresholds diminishedas the time between them increased The slope of the re-gression line becomes even steeper if the data for 1-dayseparation are removed from the regression indicatingthat the decrease in correlation started with the 2-dayseparation condition

The third method to assess testndashretest reliability isbased on the differences between successive thresholdmeasurements Each threshold was subtracted from theprevious threshold The four threshold measurements re-sulted in three difference measurements The distribu-tions of these differences for NaCl and for butyl alcoholare plotted on the left side of Figure 10 If the measuredthresholds do not systematically increase or decreaseover time we expect that the mean difference would bezero The mean of the distribution for NaCl (upper panelof Figure 10) was 0018 and for butyl alcohol (lowerpanel) was 0002 Neither of these was significantly dif-ferent from zero The standard deviations of these distri-butions were 0413 log units for NaCl and 0422 log unitsfor butyl alcohol

A fourth way to examine testndashretest reliability is bymeans of the spread of the four threshold measures foreach subject A standard deviation of 00 would meanthat all four thresholds were exactly the same value Thestandard deviation of the four measurements was com-puted for each subject The distribution of these standarddeviations for NaCl (upper panel) and butyl alcohol(lower panel) are plotted on the right side of Figure 10The median standard deviation was about 015 log unitsfor both NaCl and butyl alcohol This good testndashretestreliability is consistent with the other three methods ofassessing reliability reported above

DiscussionThe maximum-likelihood method gives estimates of

thresholds that are stable within individuals over timeand that are responsive to individual differences By allfour indicators the testndashretest reliability of both NaCland butyl alcohol thresholds measured by the maximum-likelihood method was very good

GENERAL DISCUSSION

The two experiments and the Monte Carlo simulationsreported here demonstrate that the maximum-likelihoodmethod is a viable alternative to other methods currentlyin use to measure taste and smell detection thresholdsBecause the method attempts to use stimuli that are op-timal (close to the predicted threshold) the confidenceintervals of the measured thresholds are considerablysmaller than those achieved by the other two methods(Experiment 1)

One of the strengths of the maximum-likelihoodmethod is that it provides an explicit way of estimatingthe confidence interval of the threshold a facility notprovided by the other two methods The experimenter isfree to establish a stopping confidence interval to suithisher needs If greater precision is desired it can beachieved at the expense of running more trials This ex-plicit tradeoff between precision and number of trials isnot possible with the other two psychophysical methodsOne could of course use a hybrid combination of meth-ods an upndashdown staircase method for generating the nextstimulus to use combined with a maximum-likelihoodmethod for estimating the threshold and its confidenceinterval In effect we used this approach after the datafrom Experiment 1 were collected by reanalyzing them

Table 3Mean NaCl and Butyl Alcohol Thresholds (and Standard Errors) From Experiment 2

Day 1 Day 2 Day 3 Day 4

M SE M SE M SE M SE

NaClConsecutive (Group 1) 262 013 267 011 282 013 263 011Distributed (Group 2) 290 019 283 009 283 011 277 010

Butyl AlcoholConsecutive (Group 1) 319 014 317 015 325 012 310 013Distributed (Group 2) 300 016 310 014 309 010 309 011

Table 4Pearson Correlation Coefficients Between Thresholds

Paired Sessions

1ndash2 1ndash3 1ndash4 2ndash3 2ndash4 3 ndash 4

NaClConsecutive (Group 1) 46 66 66 44 75 66Distributed (Group 2) 58 72 40 55 20 51Butyl alcoholConsecutive (Group 1) 86 83 86 67 71 76Distributed (Group 2) 26 68 56 51 21 65

p lt 05 p lt 01 p lt 001

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 7: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

1336 LINSCHOTEN HARVEY ELLER AND JAFEK

subjects were asked to select the bottle containing the stimulus Thesubjects were asked to give their best guess even when they werecertain that the two stimuli were identical The stimulus and theblank were presented in 250-ml squeezable polyethylene bottlescontaining 60 ml of liquid The subjects were instructed to keep onenostril closed while the other nostril was tested with three puffs outof each bottle For each subject four thresholds were obtained twofor each nostril In randomized order and alternating nostrils onethreshold was determined with the ascending method one with theupndashdown method and two with the adaptive maximum-likelihoo dmethod Testing took about 45 min

The definition of detection threshold differs in several aspectsamong the three methods In the ascending method testing startedwith the lowest concentration A correct response led to the pre-sentation of the same concentration on the next trial When an in-correct response was given the next higher concentration was usedTesting stopped when five correct answers had been given in a rowThe threshold was defined as the last-used concentration (Cain et al1983) and thus corresponded to a detection probability of 10

In the upndashdown staircase method testing could start at any stepin the concentration series The first subject was started in the mid-dle of the range and each subsequent subject started where the for-mer subject had ended After one correct response the same con-centration was presented if two correct responses were given thenext lower concentration was presented on the next trial One in-correct response led to the presentation of a concentration one stephigher This decision rule is called the one-upndashtwo-down rule Al-though this decision rule is widely used the actual threshold esti-mate depends more on the size of the stimulus step than on the par-ticular upndashdown rule (Garcia-Perez 1998) Testing stopped afterfive reversals and the threshold was computed as the average of the

stimulus concentrations at the last four reversals This thresholdcorresponded to a detection probability of approximately 75

In the maximum-likelihood adaptive staircase method testingstarted in the middle of the stimulus range After each response theestimate of the threshold was calculated and the next stimulus wasa concentration close to that threshold Testing stopped when theconfidence interval of the estimated threshold reached a predefinedcriterion (08 log concentration units) This threshold corresponde dto a detection probability of exactly 75 All these computationswere carried out in real time on a Power Macintosh computer usinga program SensoryTester written by author LOH

Data analysis The mean threshold (and standard error) pro-duced by each method is presented in Table 1 labeled ldquoNativeThresholdrdquo As noted above each method produces a threshold es-timate that corresponds to a different performance level of detec-tion To obtain threshold and confidence interval estimates for theascending method of limits and the upndashdown staircase method thatcould be directly compared with those obtained by the ML-PESTmethod a logistic psychometric function was fitted to the actual se-quence of stimulus presentations and responses from these twomethods The values of b and g were held constant at the same valuesused in the ML-PEST method The maximum-likelihood thresholdand the confidence interval were then computed in exactly the samemanner as with the ML-PEST method This f itting procedure willgive a lower threshold for the ascending method of limits data thanthat reported by the method of limits procedure itself because theascending method of limits looks for a threshold stimulus giving100 correct whereas the maximum-likelihood threshold corre-sponds to 75 correct A repeated measures analysis of variance(ANOVA) was computed using StatView 501 (SAS Institute 1998a1998b) on each of four dependent variables native log threshold

Figure 5 Stimulus presentation as a function of trial number in the two-alternative forced-choiceMonte Carlo simulation Trials on which the subjectrsquos response was correct are represented by thefilled circles trials on which the subjectrsquos response was incorrect are represented by the open cir-cles The resulting maximum-likelihood estimate of a is plotted by the solid line The horizontaldotted line marks the true value of a The vertical bars show the 95 confidence interval of eachestimate of a

FAST AND ACCURATE THRESHOLDS 1337

number of trials to reach the native threshold maximum-likelihoo dlogistic threshold and confidence interval of the maximum-likelihoo dthreshold

ResultsThe native threshold values the mean numbers of tri-

als needed to reach the stopping criterion the maximum-likelihood logistic thresholds and the mean confidenceintervals of the maximum-likelihood thresholds are sum-marized in Table 1 A repeated measures ANOVA on thenative thresholds showed no significant main effect ofmethod [F(331) = 161 p lt 187] Subsequent statisticalcontrast comparisons using the standard HuynhndashFeldt eto correct the p value and the degrees of freedom re-vealed that the native threshold of the ascending methodof limits was just signif icantly higher than the othermethods [F(131) = 392 p = 051] These higher thresh-olds are to be expected given that the ascending methodof limits computes the threshold to be the concentrationcorresponding to a detection probability of 10 whereasthe other methods use a performance criterion of about 75

A repeated measures ANOVA on the number of trialsneeded to reach the stopping criterion showed a signifi-cant main effect of method [F(331) = 626 p lt 0008]Statistical contrast comparisons showed that the ascend-ing method required significantly fewer trials to deter-mine threshold [F(131) = 641 p lt 014] and that theupndashdown method required signif icantly more trials[F(131) = 606 p lt 017] than the average of the twomeasurements with the maximum-likelihood method

A repeated measures ANOVA on the maximum-like-lihood logistic thresholds showed no significant main ef-fect of method and the thresholds computed from thetrial-by-trial data from the ascending method of limitswere not significantly different from the thresholds com-puted from the data collected with the other methods[F(131) = 177 p = 19] This result is to be expectedsince presumably the performance of each observer isdriven by the same sensory process when tested by eachof the three psychometric methods

The mean confidence intervals for the thresholds (andstandard errors) are shown in the last column of Table 1The mean confidence interval of the maximum-likelihoodmethod is smaller than those of the other two Note thatthe standard error of the confidence interval reported forthe ML-PEST method is very small (002) This is be-cause the confidence interval was used as the stopping

criterion Trials were stopped when the confidence in-terval of a reached 08 A repeated measures ANOVAindicated that there was a signif icant main effect ofmethod [F(331) = 1727 p lt 0002] Contrasts tests re-vealed that the confidence intervals for the upndashdownmethod were significantly larger than those for both themaximum-likelihood method [F(131) = 5093 p lt0001] and the ascending method [F(131) = 2372 p lt0016]

DiscussionThe native absolute threshold values obtained with the

ascending method were higher than those obtained withthe other two methods One should remember that thesevalues are not conventional thresholds in the sense thatthey represent the concentration chosen correct five outof five times It is to be expected that this value will bea higher concentration than thresholds based on a lowerpercent correct Furthermore this method at least as de-scribed by Cain et al (1983) has considerably larger stepsbetween adjacent concentrations and only actual stimu-lus concentrations can be threshold values

The ascending method needed the fewest number oftrials (~13) relative to either the maximum-likelihoodmethod (~15) or the upndashdown staircase method (~17)The clear superiority of the maximum-likelihood methodhowever came with the high precision of the measuredthresholds as indicated by the small confidence inter-vals The confidence intervals of the upndashdown staircasewere almost three times larger than those given by themaximum-likelihood method

MONTE CARLO SIMULATIONS

When measuring the threshold of an actual subject itis not possible to know what the error is between themeasured value and the ldquotruerdquo value because the truevalue is not known In order to evaluate the accuracy ofthe three psychophysical methods we resorted to MonteCarlo evaluations of simulated observers having a pre-determined threshold value We used each of the threepsychophysical methods to measure the threshold ofeach simulated observer and compared the result withthe true value

Monte Carlo simulations were run for each of thethree psychophysical procedures using the same rulesand stimulus series as were used in the threshold mea-

Table 1Mean Performance (and Standard Error) of Three Psychophysical Methods

With Butyl Alcohol in Experiment 1

Native Number of ML Logistic Confidence Threshold Trials Threshold Interval

Method M SE M SE M SE M SE

ML-PESTLeft 341 016 1513 065 341 016 080 002Right 327 013 1541 076 327 013 084 002

WetherillndashLevitt 330 010 1744 072 330 011 232 034Ascending limits 310 013 1303 068 348 014 114 002

1338 LINSCHOTEN HARVEY ELLER AND JAFEK

surements for butyl alcohol with the real subjects re-ported above in Experiment 1 A population of 10000simulated subjects was defined whose thresholds fol-lowed a Gaussian distribution with a mean threshold of

300 log percent concentration and a standard devia-tion of 04 log units These values are representative ofthe distribution of real subjectsrsquo detection thresholds forbutyl alcohol as found in Experiment 2 below The pre-determined threshold for each of the 10000 simulatedobservers was drawn from the above distributionThreshold values higher than the highest stimulus con-centration and lower than the lowest stimulus concentra-tion were not used

Simulation 1On each trial the simulated subject was presented

with a stimulus Each subjectrsquos detection behavior wasdriven by a logistic psychometric function with a steep-ness of 35 and a chance level performance of 05 Theprobability of making the correct response on a trial wasdetermined by the probability predicted by the logisticfunction for the stimulus and for that subject and a uni-form random number between 0 and 1 If the randomnumber was less than the logistic function probability forthat stimulus the response was scored as correct other-wise it was scored as wrong The starting stimulus forthe upndashdown staircase and maximum-likelihood meth-ods was chosen randomly between the two stimuli thatbracketed the mean threshold of 30 log concentrationThese stimuli were 3113 and 2937 log concentra-tion The ascending method started with the lowest stim-ulus in its series 5130 log percent concentration

An ascending limits run was considered a failure if theprocedure called for a stimulus stronger than the higheststimulus concentration A WetherillndashLevitt staircase runwas considered a failure if during the trials a stimulusstronger than the highest stimulus concentration orweaker than the lowest stimulus concentration was calledfor In the present simulation the WetherillndashLevitt stair-

case failed 66 times All of these failures were caused bythe procedurersquos calling for a lower stimulus concentra-tion than the lowest available An ML-PEST run wouldhave been considered a failure if the final estimate of thethreshold were more than half the stopping confidenceinterval below the lowest candidate or above the highestcandidate a There were no such failures The failed runsare excluded from the data analyzed below

The distribution of errors for each method (estimatedthreshold minus true threshold) for the 10000 subjectsis shown in the left side of Figure 6 The distribution ofthe number of trials to reach stopping criterion for the10000 subjects is shown on the right side of Figure 6The maximum-likelihood method was more accuratethan the WetherillndashLevitt staircase and both of thesemethods were superior to the ascending method of lim-its The median error and the number of trials requiredfor each method are shown in Table 2 The median errorfor the maximum-likelihood method was almost half aslarge as that for the upndashdown staircase although both er-rors were quite small

The error on each of the 10000 subjects is plotted inFigure 7 as a function of the number of trials for that runThe range of errors became smaller as the number of tri-als increased for both the WetherillndashLevitt staircasemethod and the ML-PEST method and the median errorstayed close to zero The ascending staircase methodshowed quite different behavior The direction of theerror depended on the number of trials before the stop-ping criterion of five correct in a row had been reachedThe estimated thresholds were too low when the numberof trials was less than about 15 whereas the estimates ofthreshold were too high when the number of trials wasmore than 15

Simulation 2Because the WetherillndashLevitt staircase method and the

ML-PEST method generally required different numbers oftrials before the stopping criterion was reached on eachrun comparison of the accuracy is not straightforwardWe repeated the Monte Carlo simulations so that the stair-case method and the ML-PEST method used the samenumber of trials on each run First the WetherillndashLevittstaircase was run until the stopping criterion wasreached Then the ML-PEST method was run with thesame number of trials on that observer using the samestarting stimulus

The results are presented in Table 2 When the ML-PEST method had the same number of trials as theupndashdown staircase the ML-PEST method was slightlymore accurate As in Simulation 1 both of these meth-ods were superior in accuracy to the ascending staircaseThere were 60 failures out of the 10000 cases where theupndashdown staircase called for a stimulus that was lowerthan the lowest available concentration Because theML-PEST method was not permitted to run as many tri-als as it would normally require this method failed on 30out of the 10000 cases

Table 2Results of the Four Monte Carlo Simulations

Method Median Error Median Number of Trials

Simulation 1Maximum-likelihood 0027 16WetherillndashLevitt 0045 17Ascending limits 0331 14

Simulation 2Maximum-likelihood 0094 17WetherillndashLevitt 0101 17Ascending limits 0419 14

Simulation 3Maximum-likelihood 003 16WetherillndashLevitt 360 16Ascending limits 354 16

Simulation 4Maximum-likelihood 275 115WetherillndashLevitt 093 21Ascending limits 336 22

FAST AND ACCURATE THRESHOLDS 1339

Simulation 3An important ability of sensory testing is to detect

people who deliberately try to appear to have diminishedsensitivity We therefore repeated the Monte Carlo sim-ulations of 10000 subjects using the same starting con-ditions and the same test conditions as those in Simula-tion 1 but each simulated subject generated ldquountruthfulrdquoresponses (ie responses just the opposite of that calledfor by the subjectrsquos psychometric function) In the ML-

PEST computer program two likelihood functions aremaintained one based on the hypothesis that the subjectis responding truthfully and the other based on the hy-pothesis that the subject is deliberately choosing thewrong response The program explicitly tests the hy-pothesis that a subject is deliberately choosing the wrongresponse by comparing the maxima of the two likelihooddistributions and therefore should be able to measure a sensory threshold as effectively as when a subject is

Figure 6 Distribution of errors (left) and number of trials (right) for three psychophysical methods maximum-likelihoodadaptive staircase (upper panel) Wetherill and Levitt upndashdown staircase (middle panel) and ascending method of limits (lowerpanel) The vertical line in the left panels marks the zero-error bin of each histogram See text for details The abscissa scalefor the upper two left panels covers a range of 15 log units and is 5 log units for the lower left histogram

1340 LINSCHOTEN HARVEY ELLER AND JAFEK

Figure 7 Error as a function of number of trials for three psychophysicalmethods ML-PEST (upper panel) upndashdown staircase (middle panel) and as-cending method of limits (lower panel)

FAST AND ACCURATE THRESHOLDS 1341

telling the truth The other two methods do not have anexplicit way of dealing with this situation It is of inter-est to learn how all three methods behave under thesecircumstances

The results of this simulation are presented in Table 2The ML-PEST method successfully measured the thresh-old and explicitly detected the lying condition in all10000 cases and was therefore able to measure thethreshold of the underlying psychometric function witha small median error The upndashdown staircase method andthe ascending method of limits reached the highest con-centration and tried to go higher resulting in 9824 fail-ures for the staircase method and 8648 failures for themethod of limits If we consider a failure as a successfuldetection of the lying behavior the upndashdown staircasemethod detected 98 of the cases and the ascendingmethod of limits detected 86 of the cases These twomethods required about the same number of trials (seeTable 2)

Simulation 4It is important in clinical testing that people with ele-

vated thresholds be easily detected One of the subjects inExperiment 1 f irst had smell tested by the upndashdownmethod and exhibited a normal threshold Later the samenostril was tested using the maximum-likelihood methodand no threshold could be measured because as it turnedout he was anosmic on that side The fact that the upndashdown method did not detect the anosmia is of concern

We therefore repeated the Monte Carlo simulations of10000 subjects using the same starting conditions andthe same test conditions as those in Simulation 1 withone change Each simulated subject generated a randomresponse as would be the case if a person had no sensorysensitivity The ideal behavior of a testing method wouldbe to indicate that the threshold was at the highest valuethat is possible with that method or to have some otherexplicit way of indicating that the subject is respondingrandomly

None of the three methods performed very well in de-tecting the random responding The median errors andnumber of trials are given in Table 2 The upndashdown stair-case method did the poorest because it gives a medianerror of 93 This rather low error occurs because on87 of the cases the stopping criterion of 5 reversalswas satisfied before the method reached the upper limitof the stimulus concentrations The ML-PEST methodfailed on 11 of the cases but overall the thresholdswere estimated to be very high Unfortunately it re-quired a great number of trials to achieve this perfor-mance The ascending method of limits reached theupper limit of stimulus concentration on 66 of thecases and also gave the highest values for the threshold

Since in Monte Carlo simulations the distribution ofthe actual thresholds is known we can apply an SDT analy-sis (Green amp Swets 19661974 Macmillan amp Creelman1991) to compare the distribution of estimated thresh-olds made by each method under random respondingwith the true distribution Using the means and standard

deviations of the above distributions we computed thesignal detection measures of sensitivity da and accuracythe area under the ROC curve Az (Harvey 1992 Simp-son amp Fitter 1973) The ML-PEST method had the high-est sensitivity and accuracy (da = 227 Az = 95) The as-cending method was next best (da = 199 Az = 92) andthe upndashdown staircase method was the least sensitive(da = 150 Az = 86)

DiscussionThe results of the simulations lead to the conclusion that

both the upndashdown staircase method and the maximum-likelihood method have considerable strengths for use inmeasuring taste and smell thresholds The ascendingmethod of limits should not be used if one is interestedin obtaining an accurate estimate of the threshold Thesethreshold estimates covary with the number of trialstaken to stop as is illustrated in the lower panel of Fig-ure 7 The simulation results offer an explanation for thepublished finding that ascending limit measurement ofolfactory thresholds are less reliable than staircase mea-sured thresholds (Doty McKeown Lee amp Shaman1995)

The upndashdown staircase method gives an average errortwice as large as that of the ML-PEST method althoughboth values are small and therefore similar to eachother The upndashdown staircase method failed by callingfor a stimulus too low on 66 out of 10000 cases whereasthe ML-PEST method had no such failures Althoughthe median number of trials gives the edge to the ML-PEST method on some occasions the ML-PEST methodrequired over 50 trials to reach its stopping criterion

A strength of the ML-PEST method is its ability toconsider alternate hypotheses about the response strat-egy of the observer The ML-PEST method was able tocorrectly detect all 10000 dishonest responders We arecurrently preparing a paper that will go into considerabledetail on how to distinguish malingerers from true anos-mics using a variety of methods Another strength is thatthere is virtually no bias introduced into threshold mea-surements by having the ML-PEST steepness differentfrom the ldquotruerdquo value set in the Monte Carlo observer(Treutwein amp Strasburger 1999) Data generated from aMonte Carlo Weibull function observer can be well fit byvirtually any s-shaped psychometric function althoughthe Weibull will usually fit slightly better We thereforerecommend the ML-PEST method for testing taste andsmell

EXPERIMENT 2TestndashRetest Reliability

In order to assess the stability of the measures ob-tained with the adaptive maximum-likelihood procedurewe tested the subjects on four occasions

MethodSubjects Twenty-seven nonsmoking healthy subjects partici-

pated in the experiment (mean age = 316 plusmn 87 years range =

1342 LINSCHOTEN HARVEY ELLER AND JAFEK

22ndash57 years) The 14 women and 13 men were divided into twogroups Group 1 (ldquoconsecutive rdquo 7 females and 7 males) was testedon 4 consecutive days and Group 2 (ldquodistributed rdquo 7 females and 6males) was tested on Days 1 2 8 and 22

Stimuli Smell thresholds for butyl alcohol were assessed usingthe same range of concentrations as were used with the maximum-likelihood method in Experiment 1 Taste thresholds were measuredusing 20 concentrations of NaCl in quarter-log steps The highestconcentration was 17 M The same maximum-likelihood adaptivemethod as described in Experiment 1 was used in this experiment

Procedure Thresholds were determined with a temporal 2AFCprocedure Taste sensitivity was measured with a whole-mouth sip-and-spit procedure The stimuli and the blanks were presented in30-ml plastic medicine cups containing 5 ml of liquid The sub-jects were requested to take the f irst sample in their mouths swishit around and spit it out then rinse with distilled water and do thesame with the second sample They were asked to choose the sam-ple that had a taste different from water In each of the four sessionstaste thresholds were determined first and smell thresholds last Allfour smell thresholds were determined in the preferred nostril Thesubjects were tested at the same time of each day

ResultsBecause the measurement and specification of testndash

retest reliability is a rather complex topic (Linn 1989)

we have chosen four different ways to demonstrate thereliability of the maximum-likelihood method For thefirst method the four threshold measurements for eachsubject are plotted in Figure 8 The thick line in eachpanel is the mean threshold It is clear from examinationof Figure 8 that most individual thresholds were stableover 4 days (left column of panels) and over 22 days(right column of panels) with a few subjects showingshifts of more than a 1 log unit The mean thresholds forthe two groups are shown in Table 3 A repeated mea-sures ANOVA confirmed the stability of the thresholdsThe effect of testing day was not significant

A second approach is to compute the correlationsamong the four threshold measurements (see eg Mattes1988) Pearson correlation coefficients between thresh-olds obtained on different days are shown in Table 4 Thecorrelations were generally higher for pairs of thresholdsthat were measured close together in time As Mattespointed out correlations are highly influenced by smallchanges in ordinal ranking of the subjects The fact thatthe pairwise correlations became smaller as time sepa-ration between them increased is shown in Figure 9 The

Figure 8 Log threshold concentration as a function of test day for NaCl (upper row) and butanol (lower row) measured on 4 suc-cessive days (left column) and over 22 days (right column) The thin lines connect the four thresholds of individual subjects The thickline is the mean threshold concentration of all subjects

FAST AND ACCURATE THRESHOLDS 1343

solid line is the least squares regression line It indicatesthat the correlation between two thresholds diminishedas the time between them increased The slope of the re-gression line becomes even steeper if the data for 1-dayseparation are removed from the regression indicatingthat the decrease in correlation started with the 2-dayseparation condition

The third method to assess testndashretest reliability isbased on the differences between successive thresholdmeasurements Each threshold was subtracted from theprevious threshold The four threshold measurements re-sulted in three difference measurements The distribu-tions of these differences for NaCl and for butyl alcoholare plotted on the left side of Figure 10 If the measuredthresholds do not systematically increase or decreaseover time we expect that the mean difference would bezero The mean of the distribution for NaCl (upper panelof Figure 10) was 0018 and for butyl alcohol (lowerpanel) was 0002 Neither of these was significantly dif-ferent from zero The standard deviations of these distri-butions were 0413 log units for NaCl and 0422 log unitsfor butyl alcohol

A fourth way to examine testndashretest reliability is bymeans of the spread of the four threshold measures foreach subject A standard deviation of 00 would meanthat all four thresholds were exactly the same value Thestandard deviation of the four measurements was com-puted for each subject The distribution of these standarddeviations for NaCl (upper panel) and butyl alcohol(lower panel) are plotted on the right side of Figure 10The median standard deviation was about 015 log unitsfor both NaCl and butyl alcohol This good testndashretestreliability is consistent with the other three methods ofassessing reliability reported above

DiscussionThe maximum-likelihood method gives estimates of

thresholds that are stable within individuals over timeand that are responsive to individual differences By allfour indicators the testndashretest reliability of both NaCland butyl alcohol thresholds measured by the maximum-likelihood method was very good

GENERAL DISCUSSION

The two experiments and the Monte Carlo simulationsreported here demonstrate that the maximum-likelihoodmethod is a viable alternative to other methods currentlyin use to measure taste and smell detection thresholdsBecause the method attempts to use stimuli that are op-timal (close to the predicted threshold) the confidenceintervals of the measured thresholds are considerablysmaller than those achieved by the other two methods(Experiment 1)

One of the strengths of the maximum-likelihoodmethod is that it provides an explicit way of estimatingthe confidence interval of the threshold a facility notprovided by the other two methods The experimenter isfree to establish a stopping confidence interval to suithisher needs If greater precision is desired it can beachieved at the expense of running more trials This ex-plicit tradeoff between precision and number of trials isnot possible with the other two psychophysical methodsOne could of course use a hybrid combination of meth-ods an upndashdown staircase method for generating the nextstimulus to use combined with a maximum-likelihoodmethod for estimating the threshold and its confidenceinterval In effect we used this approach after the datafrom Experiment 1 were collected by reanalyzing them

Table 3Mean NaCl and Butyl Alcohol Thresholds (and Standard Errors) From Experiment 2

Day 1 Day 2 Day 3 Day 4

M SE M SE M SE M SE

NaClConsecutive (Group 1) 262 013 267 011 282 013 263 011Distributed (Group 2) 290 019 283 009 283 011 277 010

Butyl AlcoholConsecutive (Group 1) 319 014 317 015 325 012 310 013Distributed (Group 2) 300 016 310 014 309 010 309 011

Table 4Pearson Correlation Coefficients Between Thresholds

Paired Sessions

1ndash2 1ndash3 1ndash4 2ndash3 2ndash4 3 ndash 4

NaClConsecutive (Group 1) 46 66 66 44 75 66Distributed (Group 2) 58 72 40 55 20 51Butyl alcoholConsecutive (Group 1) 86 83 86 67 71 76Distributed (Group 2) 26 68 56 51 21 65

p lt 05 p lt 01 p lt 001

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 8: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

FAST AND ACCURATE THRESHOLDS 1337

number of trials to reach the native threshold maximum-likelihoo dlogistic threshold and confidence interval of the maximum-likelihoo dthreshold

ResultsThe native threshold values the mean numbers of tri-

als needed to reach the stopping criterion the maximum-likelihood logistic thresholds and the mean confidenceintervals of the maximum-likelihood thresholds are sum-marized in Table 1 A repeated measures ANOVA on thenative thresholds showed no significant main effect ofmethod [F(331) = 161 p lt 187] Subsequent statisticalcontrast comparisons using the standard HuynhndashFeldt eto correct the p value and the degrees of freedom re-vealed that the native threshold of the ascending methodof limits was just signif icantly higher than the othermethods [F(131) = 392 p = 051] These higher thresh-olds are to be expected given that the ascending methodof limits computes the threshold to be the concentrationcorresponding to a detection probability of 10 whereasthe other methods use a performance criterion of about 75

A repeated measures ANOVA on the number of trialsneeded to reach the stopping criterion showed a signifi-cant main effect of method [F(331) = 626 p lt 0008]Statistical contrast comparisons showed that the ascend-ing method required significantly fewer trials to deter-mine threshold [F(131) = 641 p lt 014] and that theupndashdown method required signif icantly more trials[F(131) = 606 p lt 017] than the average of the twomeasurements with the maximum-likelihood method

A repeated measures ANOVA on the maximum-like-lihood logistic thresholds showed no significant main ef-fect of method and the thresholds computed from thetrial-by-trial data from the ascending method of limitswere not significantly different from the thresholds com-puted from the data collected with the other methods[F(131) = 177 p = 19] This result is to be expectedsince presumably the performance of each observer isdriven by the same sensory process when tested by eachof the three psychometric methods

The mean confidence intervals for the thresholds (andstandard errors) are shown in the last column of Table 1The mean confidence interval of the maximum-likelihoodmethod is smaller than those of the other two Note thatthe standard error of the confidence interval reported forthe ML-PEST method is very small (002) This is be-cause the confidence interval was used as the stopping

criterion Trials were stopped when the confidence in-terval of a reached 08 A repeated measures ANOVAindicated that there was a signif icant main effect ofmethod [F(331) = 1727 p lt 0002] Contrasts tests re-vealed that the confidence intervals for the upndashdownmethod were significantly larger than those for both themaximum-likelihood method [F(131) = 5093 p lt0001] and the ascending method [F(131) = 2372 p lt0016]

DiscussionThe native absolute threshold values obtained with the

ascending method were higher than those obtained withthe other two methods One should remember that thesevalues are not conventional thresholds in the sense thatthey represent the concentration chosen correct five outof five times It is to be expected that this value will bea higher concentration than thresholds based on a lowerpercent correct Furthermore this method at least as de-scribed by Cain et al (1983) has considerably larger stepsbetween adjacent concentrations and only actual stimu-lus concentrations can be threshold values

The ascending method needed the fewest number oftrials (~13) relative to either the maximum-likelihoodmethod (~15) or the upndashdown staircase method (~17)The clear superiority of the maximum-likelihood methodhowever came with the high precision of the measuredthresholds as indicated by the small confidence inter-vals The confidence intervals of the upndashdown staircasewere almost three times larger than those given by themaximum-likelihood method

MONTE CARLO SIMULATIONS

When measuring the threshold of an actual subject itis not possible to know what the error is between themeasured value and the ldquotruerdquo value because the truevalue is not known In order to evaluate the accuracy ofthe three psychophysical methods we resorted to MonteCarlo evaluations of simulated observers having a pre-determined threshold value We used each of the threepsychophysical methods to measure the threshold ofeach simulated observer and compared the result withthe true value

Monte Carlo simulations were run for each of thethree psychophysical procedures using the same rulesand stimulus series as were used in the threshold mea-

Table 1Mean Performance (and Standard Error) of Three Psychophysical Methods

With Butyl Alcohol in Experiment 1

Native Number of ML Logistic Confidence Threshold Trials Threshold Interval

Method M SE M SE M SE M SE

ML-PESTLeft 341 016 1513 065 341 016 080 002Right 327 013 1541 076 327 013 084 002

WetherillndashLevitt 330 010 1744 072 330 011 232 034Ascending limits 310 013 1303 068 348 014 114 002

1338 LINSCHOTEN HARVEY ELLER AND JAFEK

surements for butyl alcohol with the real subjects re-ported above in Experiment 1 A population of 10000simulated subjects was defined whose thresholds fol-lowed a Gaussian distribution with a mean threshold of

300 log percent concentration and a standard devia-tion of 04 log units These values are representative ofthe distribution of real subjectsrsquo detection thresholds forbutyl alcohol as found in Experiment 2 below The pre-determined threshold for each of the 10000 simulatedobservers was drawn from the above distributionThreshold values higher than the highest stimulus con-centration and lower than the lowest stimulus concentra-tion were not used

Simulation 1On each trial the simulated subject was presented

with a stimulus Each subjectrsquos detection behavior wasdriven by a logistic psychometric function with a steep-ness of 35 and a chance level performance of 05 Theprobability of making the correct response on a trial wasdetermined by the probability predicted by the logisticfunction for the stimulus and for that subject and a uni-form random number between 0 and 1 If the randomnumber was less than the logistic function probability forthat stimulus the response was scored as correct other-wise it was scored as wrong The starting stimulus forthe upndashdown staircase and maximum-likelihood meth-ods was chosen randomly between the two stimuli thatbracketed the mean threshold of 30 log concentrationThese stimuli were 3113 and 2937 log concentra-tion The ascending method started with the lowest stim-ulus in its series 5130 log percent concentration

An ascending limits run was considered a failure if theprocedure called for a stimulus stronger than the higheststimulus concentration A WetherillndashLevitt staircase runwas considered a failure if during the trials a stimulusstronger than the highest stimulus concentration orweaker than the lowest stimulus concentration was calledfor In the present simulation the WetherillndashLevitt stair-

case failed 66 times All of these failures were caused bythe procedurersquos calling for a lower stimulus concentra-tion than the lowest available An ML-PEST run wouldhave been considered a failure if the final estimate of thethreshold were more than half the stopping confidenceinterval below the lowest candidate or above the highestcandidate a There were no such failures The failed runsare excluded from the data analyzed below

The distribution of errors for each method (estimatedthreshold minus true threshold) for the 10000 subjectsis shown in the left side of Figure 6 The distribution ofthe number of trials to reach stopping criterion for the10000 subjects is shown on the right side of Figure 6The maximum-likelihood method was more accuratethan the WetherillndashLevitt staircase and both of thesemethods were superior to the ascending method of lim-its The median error and the number of trials requiredfor each method are shown in Table 2 The median errorfor the maximum-likelihood method was almost half aslarge as that for the upndashdown staircase although both er-rors were quite small

The error on each of the 10000 subjects is plotted inFigure 7 as a function of the number of trials for that runThe range of errors became smaller as the number of tri-als increased for both the WetherillndashLevitt staircasemethod and the ML-PEST method and the median errorstayed close to zero The ascending staircase methodshowed quite different behavior The direction of theerror depended on the number of trials before the stop-ping criterion of five correct in a row had been reachedThe estimated thresholds were too low when the numberof trials was less than about 15 whereas the estimates ofthreshold were too high when the number of trials wasmore than 15

Simulation 2Because the WetherillndashLevitt staircase method and the

ML-PEST method generally required different numbers oftrials before the stopping criterion was reached on eachrun comparison of the accuracy is not straightforwardWe repeated the Monte Carlo simulations so that the stair-case method and the ML-PEST method used the samenumber of trials on each run First the WetherillndashLevittstaircase was run until the stopping criterion wasreached Then the ML-PEST method was run with thesame number of trials on that observer using the samestarting stimulus

The results are presented in Table 2 When the ML-PEST method had the same number of trials as theupndashdown staircase the ML-PEST method was slightlymore accurate As in Simulation 1 both of these meth-ods were superior in accuracy to the ascending staircaseThere were 60 failures out of the 10000 cases where theupndashdown staircase called for a stimulus that was lowerthan the lowest available concentration Because theML-PEST method was not permitted to run as many tri-als as it would normally require this method failed on 30out of the 10000 cases

Table 2Results of the Four Monte Carlo Simulations

Method Median Error Median Number of Trials

Simulation 1Maximum-likelihood 0027 16WetherillndashLevitt 0045 17Ascending limits 0331 14

Simulation 2Maximum-likelihood 0094 17WetherillndashLevitt 0101 17Ascending limits 0419 14

Simulation 3Maximum-likelihood 003 16WetherillndashLevitt 360 16Ascending limits 354 16

Simulation 4Maximum-likelihood 275 115WetherillndashLevitt 093 21Ascending limits 336 22

FAST AND ACCURATE THRESHOLDS 1339

Simulation 3An important ability of sensory testing is to detect

people who deliberately try to appear to have diminishedsensitivity We therefore repeated the Monte Carlo sim-ulations of 10000 subjects using the same starting con-ditions and the same test conditions as those in Simula-tion 1 but each simulated subject generated ldquountruthfulrdquoresponses (ie responses just the opposite of that calledfor by the subjectrsquos psychometric function) In the ML-

PEST computer program two likelihood functions aremaintained one based on the hypothesis that the subjectis responding truthfully and the other based on the hy-pothesis that the subject is deliberately choosing thewrong response The program explicitly tests the hy-pothesis that a subject is deliberately choosing the wrongresponse by comparing the maxima of the two likelihooddistributions and therefore should be able to measure a sensory threshold as effectively as when a subject is

Figure 6 Distribution of errors (left) and number of trials (right) for three psychophysical methods maximum-likelihoodadaptive staircase (upper panel) Wetherill and Levitt upndashdown staircase (middle panel) and ascending method of limits (lowerpanel) The vertical line in the left panels marks the zero-error bin of each histogram See text for details The abscissa scalefor the upper two left panels covers a range of 15 log units and is 5 log units for the lower left histogram

1340 LINSCHOTEN HARVEY ELLER AND JAFEK

Figure 7 Error as a function of number of trials for three psychophysicalmethods ML-PEST (upper panel) upndashdown staircase (middle panel) and as-cending method of limits (lower panel)

FAST AND ACCURATE THRESHOLDS 1341

telling the truth The other two methods do not have anexplicit way of dealing with this situation It is of inter-est to learn how all three methods behave under thesecircumstances

The results of this simulation are presented in Table 2The ML-PEST method successfully measured the thresh-old and explicitly detected the lying condition in all10000 cases and was therefore able to measure thethreshold of the underlying psychometric function witha small median error The upndashdown staircase method andthe ascending method of limits reached the highest con-centration and tried to go higher resulting in 9824 fail-ures for the staircase method and 8648 failures for themethod of limits If we consider a failure as a successfuldetection of the lying behavior the upndashdown staircasemethod detected 98 of the cases and the ascendingmethod of limits detected 86 of the cases These twomethods required about the same number of trials (seeTable 2)

Simulation 4It is important in clinical testing that people with ele-

vated thresholds be easily detected One of the subjects inExperiment 1 f irst had smell tested by the upndashdownmethod and exhibited a normal threshold Later the samenostril was tested using the maximum-likelihood methodand no threshold could be measured because as it turnedout he was anosmic on that side The fact that the upndashdown method did not detect the anosmia is of concern

We therefore repeated the Monte Carlo simulations of10000 subjects using the same starting conditions andthe same test conditions as those in Simulation 1 withone change Each simulated subject generated a randomresponse as would be the case if a person had no sensorysensitivity The ideal behavior of a testing method wouldbe to indicate that the threshold was at the highest valuethat is possible with that method or to have some otherexplicit way of indicating that the subject is respondingrandomly

None of the three methods performed very well in de-tecting the random responding The median errors andnumber of trials are given in Table 2 The upndashdown stair-case method did the poorest because it gives a medianerror of 93 This rather low error occurs because on87 of the cases the stopping criterion of 5 reversalswas satisfied before the method reached the upper limitof the stimulus concentrations The ML-PEST methodfailed on 11 of the cases but overall the thresholdswere estimated to be very high Unfortunately it re-quired a great number of trials to achieve this perfor-mance The ascending method of limits reached theupper limit of stimulus concentration on 66 of thecases and also gave the highest values for the threshold

Since in Monte Carlo simulations the distribution ofthe actual thresholds is known we can apply an SDT analy-sis (Green amp Swets 19661974 Macmillan amp Creelman1991) to compare the distribution of estimated thresh-olds made by each method under random respondingwith the true distribution Using the means and standard

deviations of the above distributions we computed thesignal detection measures of sensitivity da and accuracythe area under the ROC curve Az (Harvey 1992 Simp-son amp Fitter 1973) The ML-PEST method had the high-est sensitivity and accuracy (da = 227 Az = 95) The as-cending method was next best (da = 199 Az = 92) andthe upndashdown staircase method was the least sensitive(da = 150 Az = 86)

DiscussionThe results of the simulations lead to the conclusion that

both the upndashdown staircase method and the maximum-likelihood method have considerable strengths for use inmeasuring taste and smell thresholds The ascendingmethod of limits should not be used if one is interestedin obtaining an accurate estimate of the threshold Thesethreshold estimates covary with the number of trialstaken to stop as is illustrated in the lower panel of Fig-ure 7 The simulation results offer an explanation for thepublished finding that ascending limit measurement ofolfactory thresholds are less reliable than staircase mea-sured thresholds (Doty McKeown Lee amp Shaman1995)

The upndashdown staircase method gives an average errortwice as large as that of the ML-PEST method althoughboth values are small and therefore similar to eachother The upndashdown staircase method failed by callingfor a stimulus too low on 66 out of 10000 cases whereasthe ML-PEST method had no such failures Althoughthe median number of trials gives the edge to the ML-PEST method on some occasions the ML-PEST methodrequired over 50 trials to reach its stopping criterion

A strength of the ML-PEST method is its ability toconsider alternate hypotheses about the response strat-egy of the observer The ML-PEST method was able tocorrectly detect all 10000 dishonest responders We arecurrently preparing a paper that will go into considerabledetail on how to distinguish malingerers from true anos-mics using a variety of methods Another strength is thatthere is virtually no bias introduced into threshold mea-surements by having the ML-PEST steepness differentfrom the ldquotruerdquo value set in the Monte Carlo observer(Treutwein amp Strasburger 1999) Data generated from aMonte Carlo Weibull function observer can be well fit byvirtually any s-shaped psychometric function althoughthe Weibull will usually fit slightly better We thereforerecommend the ML-PEST method for testing taste andsmell

EXPERIMENT 2TestndashRetest Reliability

In order to assess the stability of the measures ob-tained with the adaptive maximum-likelihood procedurewe tested the subjects on four occasions

MethodSubjects Twenty-seven nonsmoking healthy subjects partici-

pated in the experiment (mean age = 316 plusmn 87 years range =

1342 LINSCHOTEN HARVEY ELLER AND JAFEK

22ndash57 years) The 14 women and 13 men were divided into twogroups Group 1 (ldquoconsecutive rdquo 7 females and 7 males) was testedon 4 consecutive days and Group 2 (ldquodistributed rdquo 7 females and 6males) was tested on Days 1 2 8 and 22

Stimuli Smell thresholds for butyl alcohol were assessed usingthe same range of concentrations as were used with the maximum-likelihood method in Experiment 1 Taste thresholds were measuredusing 20 concentrations of NaCl in quarter-log steps The highestconcentration was 17 M The same maximum-likelihood adaptivemethod as described in Experiment 1 was used in this experiment

Procedure Thresholds were determined with a temporal 2AFCprocedure Taste sensitivity was measured with a whole-mouth sip-and-spit procedure The stimuli and the blanks were presented in30-ml plastic medicine cups containing 5 ml of liquid The sub-jects were requested to take the f irst sample in their mouths swishit around and spit it out then rinse with distilled water and do thesame with the second sample They were asked to choose the sam-ple that had a taste different from water In each of the four sessionstaste thresholds were determined first and smell thresholds last Allfour smell thresholds were determined in the preferred nostril Thesubjects were tested at the same time of each day

ResultsBecause the measurement and specification of testndash

retest reliability is a rather complex topic (Linn 1989)

we have chosen four different ways to demonstrate thereliability of the maximum-likelihood method For thefirst method the four threshold measurements for eachsubject are plotted in Figure 8 The thick line in eachpanel is the mean threshold It is clear from examinationof Figure 8 that most individual thresholds were stableover 4 days (left column of panels) and over 22 days(right column of panels) with a few subjects showingshifts of more than a 1 log unit The mean thresholds forthe two groups are shown in Table 3 A repeated mea-sures ANOVA confirmed the stability of the thresholdsThe effect of testing day was not significant

A second approach is to compute the correlationsamong the four threshold measurements (see eg Mattes1988) Pearson correlation coefficients between thresh-olds obtained on different days are shown in Table 4 Thecorrelations were generally higher for pairs of thresholdsthat were measured close together in time As Mattespointed out correlations are highly influenced by smallchanges in ordinal ranking of the subjects The fact thatthe pairwise correlations became smaller as time sepa-ration between them increased is shown in Figure 9 The

Figure 8 Log threshold concentration as a function of test day for NaCl (upper row) and butanol (lower row) measured on 4 suc-cessive days (left column) and over 22 days (right column) The thin lines connect the four thresholds of individual subjects The thickline is the mean threshold concentration of all subjects

FAST AND ACCURATE THRESHOLDS 1343

solid line is the least squares regression line It indicatesthat the correlation between two thresholds diminishedas the time between them increased The slope of the re-gression line becomes even steeper if the data for 1-dayseparation are removed from the regression indicatingthat the decrease in correlation started with the 2-dayseparation condition

The third method to assess testndashretest reliability isbased on the differences between successive thresholdmeasurements Each threshold was subtracted from theprevious threshold The four threshold measurements re-sulted in three difference measurements The distribu-tions of these differences for NaCl and for butyl alcoholare plotted on the left side of Figure 10 If the measuredthresholds do not systematically increase or decreaseover time we expect that the mean difference would bezero The mean of the distribution for NaCl (upper panelof Figure 10) was 0018 and for butyl alcohol (lowerpanel) was 0002 Neither of these was significantly dif-ferent from zero The standard deviations of these distri-butions were 0413 log units for NaCl and 0422 log unitsfor butyl alcohol

A fourth way to examine testndashretest reliability is bymeans of the spread of the four threshold measures foreach subject A standard deviation of 00 would meanthat all four thresholds were exactly the same value Thestandard deviation of the four measurements was com-puted for each subject The distribution of these standarddeviations for NaCl (upper panel) and butyl alcohol(lower panel) are plotted on the right side of Figure 10The median standard deviation was about 015 log unitsfor both NaCl and butyl alcohol This good testndashretestreliability is consistent with the other three methods ofassessing reliability reported above

DiscussionThe maximum-likelihood method gives estimates of

thresholds that are stable within individuals over timeand that are responsive to individual differences By allfour indicators the testndashretest reliability of both NaCland butyl alcohol thresholds measured by the maximum-likelihood method was very good

GENERAL DISCUSSION

The two experiments and the Monte Carlo simulationsreported here demonstrate that the maximum-likelihoodmethod is a viable alternative to other methods currentlyin use to measure taste and smell detection thresholdsBecause the method attempts to use stimuli that are op-timal (close to the predicted threshold) the confidenceintervals of the measured thresholds are considerablysmaller than those achieved by the other two methods(Experiment 1)

One of the strengths of the maximum-likelihoodmethod is that it provides an explicit way of estimatingthe confidence interval of the threshold a facility notprovided by the other two methods The experimenter isfree to establish a stopping confidence interval to suithisher needs If greater precision is desired it can beachieved at the expense of running more trials This ex-plicit tradeoff between precision and number of trials isnot possible with the other two psychophysical methodsOne could of course use a hybrid combination of meth-ods an upndashdown staircase method for generating the nextstimulus to use combined with a maximum-likelihoodmethod for estimating the threshold and its confidenceinterval In effect we used this approach after the datafrom Experiment 1 were collected by reanalyzing them

Table 3Mean NaCl and Butyl Alcohol Thresholds (and Standard Errors) From Experiment 2

Day 1 Day 2 Day 3 Day 4

M SE M SE M SE M SE

NaClConsecutive (Group 1) 262 013 267 011 282 013 263 011Distributed (Group 2) 290 019 283 009 283 011 277 010

Butyl AlcoholConsecutive (Group 1) 319 014 317 015 325 012 310 013Distributed (Group 2) 300 016 310 014 309 010 309 011

Table 4Pearson Correlation Coefficients Between Thresholds

Paired Sessions

1ndash2 1ndash3 1ndash4 2ndash3 2ndash4 3 ndash 4

NaClConsecutive (Group 1) 46 66 66 44 75 66Distributed (Group 2) 58 72 40 55 20 51Butyl alcoholConsecutive (Group 1) 86 83 86 67 71 76Distributed (Group 2) 26 68 56 51 21 65

p lt 05 p lt 01 p lt 001

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 9: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

1338 LINSCHOTEN HARVEY ELLER AND JAFEK

surements for butyl alcohol with the real subjects re-ported above in Experiment 1 A population of 10000simulated subjects was defined whose thresholds fol-lowed a Gaussian distribution with a mean threshold of

300 log percent concentration and a standard devia-tion of 04 log units These values are representative ofthe distribution of real subjectsrsquo detection thresholds forbutyl alcohol as found in Experiment 2 below The pre-determined threshold for each of the 10000 simulatedobservers was drawn from the above distributionThreshold values higher than the highest stimulus con-centration and lower than the lowest stimulus concentra-tion were not used

Simulation 1On each trial the simulated subject was presented

with a stimulus Each subjectrsquos detection behavior wasdriven by a logistic psychometric function with a steep-ness of 35 and a chance level performance of 05 Theprobability of making the correct response on a trial wasdetermined by the probability predicted by the logisticfunction for the stimulus and for that subject and a uni-form random number between 0 and 1 If the randomnumber was less than the logistic function probability forthat stimulus the response was scored as correct other-wise it was scored as wrong The starting stimulus forthe upndashdown staircase and maximum-likelihood meth-ods was chosen randomly between the two stimuli thatbracketed the mean threshold of 30 log concentrationThese stimuli were 3113 and 2937 log concentra-tion The ascending method started with the lowest stim-ulus in its series 5130 log percent concentration

An ascending limits run was considered a failure if theprocedure called for a stimulus stronger than the higheststimulus concentration A WetherillndashLevitt staircase runwas considered a failure if during the trials a stimulusstronger than the highest stimulus concentration orweaker than the lowest stimulus concentration was calledfor In the present simulation the WetherillndashLevitt stair-

case failed 66 times All of these failures were caused bythe procedurersquos calling for a lower stimulus concentra-tion than the lowest available An ML-PEST run wouldhave been considered a failure if the final estimate of thethreshold were more than half the stopping confidenceinterval below the lowest candidate or above the highestcandidate a There were no such failures The failed runsare excluded from the data analyzed below

The distribution of errors for each method (estimatedthreshold minus true threshold) for the 10000 subjectsis shown in the left side of Figure 6 The distribution ofthe number of trials to reach stopping criterion for the10000 subjects is shown on the right side of Figure 6The maximum-likelihood method was more accuratethan the WetherillndashLevitt staircase and both of thesemethods were superior to the ascending method of lim-its The median error and the number of trials requiredfor each method are shown in Table 2 The median errorfor the maximum-likelihood method was almost half aslarge as that for the upndashdown staircase although both er-rors were quite small

The error on each of the 10000 subjects is plotted inFigure 7 as a function of the number of trials for that runThe range of errors became smaller as the number of tri-als increased for both the WetherillndashLevitt staircasemethod and the ML-PEST method and the median errorstayed close to zero The ascending staircase methodshowed quite different behavior The direction of theerror depended on the number of trials before the stop-ping criterion of five correct in a row had been reachedThe estimated thresholds were too low when the numberof trials was less than about 15 whereas the estimates ofthreshold were too high when the number of trials wasmore than 15

Simulation 2Because the WetherillndashLevitt staircase method and the

ML-PEST method generally required different numbers oftrials before the stopping criterion was reached on eachrun comparison of the accuracy is not straightforwardWe repeated the Monte Carlo simulations so that the stair-case method and the ML-PEST method used the samenumber of trials on each run First the WetherillndashLevittstaircase was run until the stopping criterion wasreached Then the ML-PEST method was run with thesame number of trials on that observer using the samestarting stimulus

The results are presented in Table 2 When the ML-PEST method had the same number of trials as theupndashdown staircase the ML-PEST method was slightlymore accurate As in Simulation 1 both of these meth-ods were superior in accuracy to the ascending staircaseThere were 60 failures out of the 10000 cases where theupndashdown staircase called for a stimulus that was lowerthan the lowest available concentration Because theML-PEST method was not permitted to run as many tri-als as it would normally require this method failed on 30out of the 10000 cases

Table 2Results of the Four Monte Carlo Simulations

Method Median Error Median Number of Trials

Simulation 1Maximum-likelihood 0027 16WetherillndashLevitt 0045 17Ascending limits 0331 14

Simulation 2Maximum-likelihood 0094 17WetherillndashLevitt 0101 17Ascending limits 0419 14

Simulation 3Maximum-likelihood 003 16WetherillndashLevitt 360 16Ascending limits 354 16

Simulation 4Maximum-likelihood 275 115WetherillndashLevitt 093 21Ascending limits 336 22

FAST AND ACCURATE THRESHOLDS 1339

Simulation 3An important ability of sensory testing is to detect

people who deliberately try to appear to have diminishedsensitivity We therefore repeated the Monte Carlo sim-ulations of 10000 subjects using the same starting con-ditions and the same test conditions as those in Simula-tion 1 but each simulated subject generated ldquountruthfulrdquoresponses (ie responses just the opposite of that calledfor by the subjectrsquos psychometric function) In the ML-

PEST computer program two likelihood functions aremaintained one based on the hypothesis that the subjectis responding truthfully and the other based on the hy-pothesis that the subject is deliberately choosing thewrong response The program explicitly tests the hy-pothesis that a subject is deliberately choosing the wrongresponse by comparing the maxima of the two likelihooddistributions and therefore should be able to measure a sensory threshold as effectively as when a subject is

Figure 6 Distribution of errors (left) and number of trials (right) for three psychophysical methods maximum-likelihoodadaptive staircase (upper panel) Wetherill and Levitt upndashdown staircase (middle panel) and ascending method of limits (lowerpanel) The vertical line in the left panels marks the zero-error bin of each histogram See text for details The abscissa scalefor the upper two left panels covers a range of 15 log units and is 5 log units for the lower left histogram

1340 LINSCHOTEN HARVEY ELLER AND JAFEK

Figure 7 Error as a function of number of trials for three psychophysicalmethods ML-PEST (upper panel) upndashdown staircase (middle panel) and as-cending method of limits (lower panel)

FAST AND ACCURATE THRESHOLDS 1341

telling the truth The other two methods do not have anexplicit way of dealing with this situation It is of inter-est to learn how all three methods behave under thesecircumstances

The results of this simulation are presented in Table 2The ML-PEST method successfully measured the thresh-old and explicitly detected the lying condition in all10000 cases and was therefore able to measure thethreshold of the underlying psychometric function witha small median error The upndashdown staircase method andthe ascending method of limits reached the highest con-centration and tried to go higher resulting in 9824 fail-ures for the staircase method and 8648 failures for themethod of limits If we consider a failure as a successfuldetection of the lying behavior the upndashdown staircasemethod detected 98 of the cases and the ascendingmethod of limits detected 86 of the cases These twomethods required about the same number of trials (seeTable 2)

Simulation 4It is important in clinical testing that people with ele-

vated thresholds be easily detected One of the subjects inExperiment 1 f irst had smell tested by the upndashdownmethod and exhibited a normal threshold Later the samenostril was tested using the maximum-likelihood methodand no threshold could be measured because as it turnedout he was anosmic on that side The fact that the upndashdown method did not detect the anosmia is of concern

We therefore repeated the Monte Carlo simulations of10000 subjects using the same starting conditions andthe same test conditions as those in Simulation 1 withone change Each simulated subject generated a randomresponse as would be the case if a person had no sensorysensitivity The ideal behavior of a testing method wouldbe to indicate that the threshold was at the highest valuethat is possible with that method or to have some otherexplicit way of indicating that the subject is respondingrandomly

None of the three methods performed very well in de-tecting the random responding The median errors andnumber of trials are given in Table 2 The upndashdown stair-case method did the poorest because it gives a medianerror of 93 This rather low error occurs because on87 of the cases the stopping criterion of 5 reversalswas satisfied before the method reached the upper limitof the stimulus concentrations The ML-PEST methodfailed on 11 of the cases but overall the thresholdswere estimated to be very high Unfortunately it re-quired a great number of trials to achieve this perfor-mance The ascending method of limits reached theupper limit of stimulus concentration on 66 of thecases and also gave the highest values for the threshold

Since in Monte Carlo simulations the distribution ofthe actual thresholds is known we can apply an SDT analy-sis (Green amp Swets 19661974 Macmillan amp Creelman1991) to compare the distribution of estimated thresh-olds made by each method under random respondingwith the true distribution Using the means and standard

deviations of the above distributions we computed thesignal detection measures of sensitivity da and accuracythe area under the ROC curve Az (Harvey 1992 Simp-son amp Fitter 1973) The ML-PEST method had the high-est sensitivity and accuracy (da = 227 Az = 95) The as-cending method was next best (da = 199 Az = 92) andthe upndashdown staircase method was the least sensitive(da = 150 Az = 86)

DiscussionThe results of the simulations lead to the conclusion that

both the upndashdown staircase method and the maximum-likelihood method have considerable strengths for use inmeasuring taste and smell thresholds The ascendingmethod of limits should not be used if one is interestedin obtaining an accurate estimate of the threshold Thesethreshold estimates covary with the number of trialstaken to stop as is illustrated in the lower panel of Fig-ure 7 The simulation results offer an explanation for thepublished finding that ascending limit measurement ofolfactory thresholds are less reliable than staircase mea-sured thresholds (Doty McKeown Lee amp Shaman1995)

The upndashdown staircase method gives an average errortwice as large as that of the ML-PEST method althoughboth values are small and therefore similar to eachother The upndashdown staircase method failed by callingfor a stimulus too low on 66 out of 10000 cases whereasthe ML-PEST method had no such failures Althoughthe median number of trials gives the edge to the ML-PEST method on some occasions the ML-PEST methodrequired over 50 trials to reach its stopping criterion

A strength of the ML-PEST method is its ability toconsider alternate hypotheses about the response strat-egy of the observer The ML-PEST method was able tocorrectly detect all 10000 dishonest responders We arecurrently preparing a paper that will go into considerabledetail on how to distinguish malingerers from true anos-mics using a variety of methods Another strength is thatthere is virtually no bias introduced into threshold mea-surements by having the ML-PEST steepness differentfrom the ldquotruerdquo value set in the Monte Carlo observer(Treutwein amp Strasburger 1999) Data generated from aMonte Carlo Weibull function observer can be well fit byvirtually any s-shaped psychometric function althoughthe Weibull will usually fit slightly better We thereforerecommend the ML-PEST method for testing taste andsmell

EXPERIMENT 2TestndashRetest Reliability

In order to assess the stability of the measures ob-tained with the adaptive maximum-likelihood procedurewe tested the subjects on four occasions

MethodSubjects Twenty-seven nonsmoking healthy subjects partici-

pated in the experiment (mean age = 316 plusmn 87 years range =

1342 LINSCHOTEN HARVEY ELLER AND JAFEK

22ndash57 years) The 14 women and 13 men were divided into twogroups Group 1 (ldquoconsecutive rdquo 7 females and 7 males) was testedon 4 consecutive days and Group 2 (ldquodistributed rdquo 7 females and 6males) was tested on Days 1 2 8 and 22

Stimuli Smell thresholds for butyl alcohol were assessed usingthe same range of concentrations as were used with the maximum-likelihood method in Experiment 1 Taste thresholds were measuredusing 20 concentrations of NaCl in quarter-log steps The highestconcentration was 17 M The same maximum-likelihood adaptivemethod as described in Experiment 1 was used in this experiment

Procedure Thresholds were determined with a temporal 2AFCprocedure Taste sensitivity was measured with a whole-mouth sip-and-spit procedure The stimuli and the blanks were presented in30-ml plastic medicine cups containing 5 ml of liquid The sub-jects were requested to take the f irst sample in their mouths swishit around and spit it out then rinse with distilled water and do thesame with the second sample They were asked to choose the sam-ple that had a taste different from water In each of the four sessionstaste thresholds were determined first and smell thresholds last Allfour smell thresholds were determined in the preferred nostril Thesubjects were tested at the same time of each day

ResultsBecause the measurement and specification of testndash

retest reliability is a rather complex topic (Linn 1989)

we have chosen four different ways to demonstrate thereliability of the maximum-likelihood method For thefirst method the four threshold measurements for eachsubject are plotted in Figure 8 The thick line in eachpanel is the mean threshold It is clear from examinationof Figure 8 that most individual thresholds were stableover 4 days (left column of panels) and over 22 days(right column of panels) with a few subjects showingshifts of more than a 1 log unit The mean thresholds forthe two groups are shown in Table 3 A repeated mea-sures ANOVA confirmed the stability of the thresholdsThe effect of testing day was not significant

A second approach is to compute the correlationsamong the four threshold measurements (see eg Mattes1988) Pearson correlation coefficients between thresh-olds obtained on different days are shown in Table 4 Thecorrelations were generally higher for pairs of thresholdsthat were measured close together in time As Mattespointed out correlations are highly influenced by smallchanges in ordinal ranking of the subjects The fact thatthe pairwise correlations became smaller as time sepa-ration between them increased is shown in Figure 9 The

Figure 8 Log threshold concentration as a function of test day for NaCl (upper row) and butanol (lower row) measured on 4 suc-cessive days (left column) and over 22 days (right column) The thin lines connect the four thresholds of individual subjects The thickline is the mean threshold concentration of all subjects

FAST AND ACCURATE THRESHOLDS 1343

solid line is the least squares regression line It indicatesthat the correlation between two thresholds diminishedas the time between them increased The slope of the re-gression line becomes even steeper if the data for 1-dayseparation are removed from the regression indicatingthat the decrease in correlation started with the 2-dayseparation condition

The third method to assess testndashretest reliability isbased on the differences between successive thresholdmeasurements Each threshold was subtracted from theprevious threshold The four threshold measurements re-sulted in three difference measurements The distribu-tions of these differences for NaCl and for butyl alcoholare plotted on the left side of Figure 10 If the measuredthresholds do not systematically increase or decreaseover time we expect that the mean difference would bezero The mean of the distribution for NaCl (upper panelof Figure 10) was 0018 and for butyl alcohol (lowerpanel) was 0002 Neither of these was significantly dif-ferent from zero The standard deviations of these distri-butions were 0413 log units for NaCl and 0422 log unitsfor butyl alcohol

A fourth way to examine testndashretest reliability is bymeans of the spread of the four threshold measures foreach subject A standard deviation of 00 would meanthat all four thresholds were exactly the same value Thestandard deviation of the four measurements was com-puted for each subject The distribution of these standarddeviations for NaCl (upper panel) and butyl alcohol(lower panel) are plotted on the right side of Figure 10The median standard deviation was about 015 log unitsfor both NaCl and butyl alcohol This good testndashretestreliability is consistent with the other three methods ofassessing reliability reported above

DiscussionThe maximum-likelihood method gives estimates of

thresholds that are stable within individuals over timeand that are responsive to individual differences By allfour indicators the testndashretest reliability of both NaCland butyl alcohol thresholds measured by the maximum-likelihood method was very good

GENERAL DISCUSSION

The two experiments and the Monte Carlo simulationsreported here demonstrate that the maximum-likelihoodmethod is a viable alternative to other methods currentlyin use to measure taste and smell detection thresholdsBecause the method attempts to use stimuli that are op-timal (close to the predicted threshold) the confidenceintervals of the measured thresholds are considerablysmaller than those achieved by the other two methods(Experiment 1)

One of the strengths of the maximum-likelihoodmethod is that it provides an explicit way of estimatingthe confidence interval of the threshold a facility notprovided by the other two methods The experimenter isfree to establish a stopping confidence interval to suithisher needs If greater precision is desired it can beachieved at the expense of running more trials This ex-plicit tradeoff between precision and number of trials isnot possible with the other two psychophysical methodsOne could of course use a hybrid combination of meth-ods an upndashdown staircase method for generating the nextstimulus to use combined with a maximum-likelihoodmethod for estimating the threshold and its confidenceinterval In effect we used this approach after the datafrom Experiment 1 were collected by reanalyzing them

Table 3Mean NaCl and Butyl Alcohol Thresholds (and Standard Errors) From Experiment 2

Day 1 Day 2 Day 3 Day 4

M SE M SE M SE M SE

NaClConsecutive (Group 1) 262 013 267 011 282 013 263 011Distributed (Group 2) 290 019 283 009 283 011 277 010

Butyl AlcoholConsecutive (Group 1) 319 014 317 015 325 012 310 013Distributed (Group 2) 300 016 310 014 309 010 309 011

Table 4Pearson Correlation Coefficients Between Thresholds

Paired Sessions

1ndash2 1ndash3 1ndash4 2ndash3 2ndash4 3 ndash 4

NaClConsecutive (Group 1) 46 66 66 44 75 66Distributed (Group 2) 58 72 40 55 20 51Butyl alcoholConsecutive (Group 1) 86 83 86 67 71 76Distributed (Group 2) 26 68 56 51 21 65

p lt 05 p lt 01 p lt 001

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 10: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

FAST AND ACCURATE THRESHOLDS 1339

Simulation 3An important ability of sensory testing is to detect

people who deliberately try to appear to have diminishedsensitivity We therefore repeated the Monte Carlo sim-ulations of 10000 subjects using the same starting con-ditions and the same test conditions as those in Simula-tion 1 but each simulated subject generated ldquountruthfulrdquoresponses (ie responses just the opposite of that calledfor by the subjectrsquos psychometric function) In the ML-

PEST computer program two likelihood functions aremaintained one based on the hypothesis that the subjectis responding truthfully and the other based on the hy-pothesis that the subject is deliberately choosing thewrong response The program explicitly tests the hy-pothesis that a subject is deliberately choosing the wrongresponse by comparing the maxima of the two likelihooddistributions and therefore should be able to measure a sensory threshold as effectively as when a subject is

Figure 6 Distribution of errors (left) and number of trials (right) for three psychophysical methods maximum-likelihoodadaptive staircase (upper panel) Wetherill and Levitt upndashdown staircase (middle panel) and ascending method of limits (lowerpanel) The vertical line in the left panels marks the zero-error bin of each histogram See text for details The abscissa scalefor the upper two left panels covers a range of 15 log units and is 5 log units for the lower left histogram

1340 LINSCHOTEN HARVEY ELLER AND JAFEK

Figure 7 Error as a function of number of trials for three psychophysicalmethods ML-PEST (upper panel) upndashdown staircase (middle panel) and as-cending method of limits (lower panel)

FAST AND ACCURATE THRESHOLDS 1341

telling the truth The other two methods do not have anexplicit way of dealing with this situation It is of inter-est to learn how all three methods behave under thesecircumstances

The results of this simulation are presented in Table 2The ML-PEST method successfully measured the thresh-old and explicitly detected the lying condition in all10000 cases and was therefore able to measure thethreshold of the underlying psychometric function witha small median error The upndashdown staircase method andthe ascending method of limits reached the highest con-centration and tried to go higher resulting in 9824 fail-ures for the staircase method and 8648 failures for themethod of limits If we consider a failure as a successfuldetection of the lying behavior the upndashdown staircasemethod detected 98 of the cases and the ascendingmethod of limits detected 86 of the cases These twomethods required about the same number of trials (seeTable 2)

Simulation 4It is important in clinical testing that people with ele-

vated thresholds be easily detected One of the subjects inExperiment 1 f irst had smell tested by the upndashdownmethod and exhibited a normal threshold Later the samenostril was tested using the maximum-likelihood methodand no threshold could be measured because as it turnedout he was anosmic on that side The fact that the upndashdown method did not detect the anosmia is of concern

We therefore repeated the Monte Carlo simulations of10000 subjects using the same starting conditions andthe same test conditions as those in Simulation 1 withone change Each simulated subject generated a randomresponse as would be the case if a person had no sensorysensitivity The ideal behavior of a testing method wouldbe to indicate that the threshold was at the highest valuethat is possible with that method or to have some otherexplicit way of indicating that the subject is respondingrandomly

None of the three methods performed very well in de-tecting the random responding The median errors andnumber of trials are given in Table 2 The upndashdown stair-case method did the poorest because it gives a medianerror of 93 This rather low error occurs because on87 of the cases the stopping criterion of 5 reversalswas satisfied before the method reached the upper limitof the stimulus concentrations The ML-PEST methodfailed on 11 of the cases but overall the thresholdswere estimated to be very high Unfortunately it re-quired a great number of trials to achieve this perfor-mance The ascending method of limits reached theupper limit of stimulus concentration on 66 of thecases and also gave the highest values for the threshold

Since in Monte Carlo simulations the distribution ofthe actual thresholds is known we can apply an SDT analy-sis (Green amp Swets 19661974 Macmillan amp Creelman1991) to compare the distribution of estimated thresh-olds made by each method under random respondingwith the true distribution Using the means and standard

deviations of the above distributions we computed thesignal detection measures of sensitivity da and accuracythe area under the ROC curve Az (Harvey 1992 Simp-son amp Fitter 1973) The ML-PEST method had the high-est sensitivity and accuracy (da = 227 Az = 95) The as-cending method was next best (da = 199 Az = 92) andthe upndashdown staircase method was the least sensitive(da = 150 Az = 86)

DiscussionThe results of the simulations lead to the conclusion that

both the upndashdown staircase method and the maximum-likelihood method have considerable strengths for use inmeasuring taste and smell thresholds The ascendingmethod of limits should not be used if one is interestedin obtaining an accurate estimate of the threshold Thesethreshold estimates covary with the number of trialstaken to stop as is illustrated in the lower panel of Fig-ure 7 The simulation results offer an explanation for thepublished finding that ascending limit measurement ofolfactory thresholds are less reliable than staircase mea-sured thresholds (Doty McKeown Lee amp Shaman1995)

The upndashdown staircase method gives an average errortwice as large as that of the ML-PEST method althoughboth values are small and therefore similar to eachother The upndashdown staircase method failed by callingfor a stimulus too low on 66 out of 10000 cases whereasthe ML-PEST method had no such failures Althoughthe median number of trials gives the edge to the ML-PEST method on some occasions the ML-PEST methodrequired over 50 trials to reach its stopping criterion

A strength of the ML-PEST method is its ability toconsider alternate hypotheses about the response strat-egy of the observer The ML-PEST method was able tocorrectly detect all 10000 dishonest responders We arecurrently preparing a paper that will go into considerabledetail on how to distinguish malingerers from true anos-mics using a variety of methods Another strength is thatthere is virtually no bias introduced into threshold mea-surements by having the ML-PEST steepness differentfrom the ldquotruerdquo value set in the Monte Carlo observer(Treutwein amp Strasburger 1999) Data generated from aMonte Carlo Weibull function observer can be well fit byvirtually any s-shaped psychometric function althoughthe Weibull will usually fit slightly better We thereforerecommend the ML-PEST method for testing taste andsmell

EXPERIMENT 2TestndashRetest Reliability

In order to assess the stability of the measures ob-tained with the adaptive maximum-likelihood procedurewe tested the subjects on four occasions

MethodSubjects Twenty-seven nonsmoking healthy subjects partici-

pated in the experiment (mean age = 316 plusmn 87 years range =

1342 LINSCHOTEN HARVEY ELLER AND JAFEK

22ndash57 years) The 14 women and 13 men were divided into twogroups Group 1 (ldquoconsecutive rdquo 7 females and 7 males) was testedon 4 consecutive days and Group 2 (ldquodistributed rdquo 7 females and 6males) was tested on Days 1 2 8 and 22

Stimuli Smell thresholds for butyl alcohol were assessed usingthe same range of concentrations as were used with the maximum-likelihood method in Experiment 1 Taste thresholds were measuredusing 20 concentrations of NaCl in quarter-log steps The highestconcentration was 17 M The same maximum-likelihood adaptivemethod as described in Experiment 1 was used in this experiment

Procedure Thresholds were determined with a temporal 2AFCprocedure Taste sensitivity was measured with a whole-mouth sip-and-spit procedure The stimuli and the blanks were presented in30-ml plastic medicine cups containing 5 ml of liquid The sub-jects were requested to take the f irst sample in their mouths swishit around and spit it out then rinse with distilled water and do thesame with the second sample They were asked to choose the sam-ple that had a taste different from water In each of the four sessionstaste thresholds were determined first and smell thresholds last Allfour smell thresholds were determined in the preferred nostril Thesubjects were tested at the same time of each day

ResultsBecause the measurement and specification of testndash

retest reliability is a rather complex topic (Linn 1989)

we have chosen four different ways to demonstrate thereliability of the maximum-likelihood method For thefirst method the four threshold measurements for eachsubject are plotted in Figure 8 The thick line in eachpanel is the mean threshold It is clear from examinationof Figure 8 that most individual thresholds were stableover 4 days (left column of panels) and over 22 days(right column of panels) with a few subjects showingshifts of more than a 1 log unit The mean thresholds forthe two groups are shown in Table 3 A repeated mea-sures ANOVA confirmed the stability of the thresholdsThe effect of testing day was not significant

A second approach is to compute the correlationsamong the four threshold measurements (see eg Mattes1988) Pearson correlation coefficients between thresh-olds obtained on different days are shown in Table 4 Thecorrelations were generally higher for pairs of thresholdsthat were measured close together in time As Mattespointed out correlations are highly influenced by smallchanges in ordinal ranking of the subjects The fact thatthe pairwise correlations became smaller as time sepa-ration between them increased is shown in Figure 9 The

Figure 8 Log threshold concentration as a function of test day for NaCl (upper row) and butanol (lower row) measured on 4 suc-cessive days (left column) and over 22 days (right column) The thin lines connect the four thresholds of individual subjects The thickline is the mean threshold concentration of all subjects

FAST AND ACCURATE THRESHOLDS 1343

solid line is the least squares regression line It indicatesthat the correlation between two thresholds diminishedas the time between them increased The slope of the re-gression line becomes even steeper if the data for 1-dayseparation are removed from the regression indicatingthat the decrease in correlation started with the 2-dayseparation condition

The third method to assess testndashretest reliability isbased on the differences between successive thresholdmeasurements Each threshold was subtracted from theprevious threshold The four threshold measurements re-sulted in three difference measurements The distribu-tions of these differences for NaCl and for butyl alcoholare plotted on the left side of Figure 10 If the measuredthresholds do not systematically increase or decreaseover time we expect that the mean difference would bezero The mean of the distribution for NaCl (upper panelof Figure 10) was 0018 and for butyl alcohol (lowerpanel) was 0002 Neither of these was significantly dif-ferent from zero The standard deviations of these distri-butions were 0413 log units for NaCl and 0422 log unitsfor butyl alcohol

A fourth way to examine testndashretest reliability is bymeans of the spread of the four threshold measures foreach subject A standard deviation of 00 would meanthat all four thresholds were exactly the same value Thestandard deviation of the four measurements was com-puted for each subject The distribution of these standarddeviations for NaCl (upper panel) and butyl alcohol(lower panel) are plotted on the right side of Figure 10The median standard deviation was about 015 log unitsfor both NaCl and butyl alcohol This good testndashretestreliability is consistent with the other three methods ofassessing reliability reported above

DiscussionThe maximum-likelihood method gives estimates of

thresholds that are stable within individuals over timeand that are responsive to individual differences By allfour indicators the testndashretest reliability of both NaCland butyl alcohol thresholds measured by the maximum-likelihood method was very good

GENERAL DISCUSSION

The two experiments and the Monte Carlo simulationsreported here demonstrate that the maximum-likelihoodmethod is a viable alternative to other methods currentlyin use to measure taste and smell detection thresholdsBecause the method attempts to use stimuli that are op-timal (close to the predicted threshold) the confidenceintervals of the measured thresholds are considerablysmaller than those achieved by the other two methods(Experiment 1)

One of the strengths of the maximum-likelihoodmethod is that it provides an explicit way of estimatingthe confidence interval of the threshold a facility notprovided by the other two methods The experimenter isfree to establish a stopping confidence interval to suithisher needs If greater precision is desired it can beachieved at the expense of running more trials This ex-plicit tradeoff between precision and number of trials isnot possible with the other two psychophysical methodsOne could of course use a hybrid combination of meth-ods an upndashdown staircase method for generating the nextstimulus to use combined with a maximum-likelihoodmethod for estimating the threshold and its confidenceinterval In effect we used this approach after the datafrom Experiment 1 were collected by reanalyzing them

Table 3Mean NaCl and Butyl Alcohol Thresholds (and Standard Errors) From Experiment 2

Day 1 Day 2 Day 3 Day 4

M SE M SE M SE M SE

NaClConsecutive (Group 1) 262 013 267 011 282 013 263 011Distributed (Group 2) 290 019 283 009 283 011 277 010

Butyl AlcoholConsecutive (Group 1) 319 014 317 015 325 012 310 013Distributed (Group 2) 300 016 310 014 309 010 309 011

Table 4Pearson Correlation Coefficients Between Thresholds

Paired Sessions

1ndash2 1ndash3 1ndash4 2ndash3 2ndash4 3 ndash 4

NaClConsecutive (Group 1) 46 66 66 44 75 66Distributed (Group 2) 58 72 40 55 20 51Butyl alcoholConsecutive (Group 1) 86 83 86 67 71 76Distributed (Group 2) 26 68 56 51 21 65

p lt 05 p lt 01 p lt 001

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 11: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

1340 LINSCHOTEN HARVEY ELLER AND JAFEK

Figure 7 Error as a function of number of trials for three psychophysicalmethods ML-PEST (upper panel) upndashdown staircase (middle panel) and as-cending method of limits (lower panel)

FAST AND ACCURATE THRESHOLDS 1341

telling the truth The other two methods do not have anexplicit way of dealing with this situation It is of inter-est to learn how all three methods behave under thesecircumstances

The results of this simulation are presented in Table 2The ML-PEST method successfully measured the thresh-old and explicitly detected the lying condition in all10000 cases and was therefore able to measure thethreshold of the underlying psychometric function witha small median error The upndashdown staircase method andthe ascending method of limits reached the highest con-centration and tried to go higher resulting in 9824 fail-ures for the staircase method and 8648 failures for themethod of limits If we consider a failure as a successfuldetection of the lying behavior the upndashdown staircasemethod detected 98 of the cases and the ascendingmethod of limits detected 86 of the cases These twomethods required about the same number of trials (seeTable 2)

Simulation 4It is important in clinical testing that people with ele-

vated thresholds be easily detected One of the subjects inExperiment 1 f irst had smell tested by the upndashdownmethod and exhibited a normal threshold Later the samenostril was tested using the maximum-likelihood methodand no threshold could be measured because as it turnedout he was anosmic on that side The fact that the upndashdown method did not detect the anosmia is of concern

We therefore repeated the Monte Carlo simulations of10000 subjects using the same starting conditions andthe same test conditions as those in Simulation 1 withone change Each simulated subject generated a randomresponse as would be the case if a person had no sensorysensitivity The ideal behavior of a testing method wouldbe to indicate that the threshold was at the highest valuethat is possible with that method or to have some otherexplicit way of indicating that the subject is respondingrandomly

None of the three methods performed very well in de-tecting the random responding The median errors andnumber of trials are given in Table 2 The upndashdown stair-case method did the poorest because it gives a medianerror of 93 This rather low error occurs because on87 of the cases the stopping criterion of 5 reversalswas satisfied before the method reached the upper limitof the stimulus concentrations The ML-PEST methodfailed on 11 of the cases but overall the thresholdswere estimated to be very high Unfortunately it re-quired a great number of trials to achieve this perfor-mance The ascending method of limits reached theupper limit of stimulus concentration on 66 of thecases and also gave the highest values for the threshold

Since in Monte Carlo simulations the distribution ofthe actual thresholds is known we can apply an SDT analy-sis (Green amp Swets 19661974 Macmillan amp Creelman1991) to compare the distribution of estimated thresh-olds made by each method under random respondingwith the true distribution Using the means and standard

deviations of the above distributions we computed thesignal detection measures of sensitivity da and accuracythe area under the ROC curve Az (Harvey 1992 Simp-son amp Fitter 1973) The ML-PEST method had the high-est sensitivity and accuracy (da = 227 Az = 95) The as-cending method was next best (da = 199 Az = 92) andthe upndashdown staircase method was the least sensitive(da = 150 Az = 86)

DiscussionThe results of the simulations lead to the conclusion that

both the upndashdown staircase method and the maximum-likelihood method have considerable strengths for use inmeasuring taste and smell thresholds The ascendingmethod of limits should not be used if one is interestedin obtaining an accurate estimate of the threshold Thesethreshold estimates covary with the number of trialstaken to stop as is illustrated in the lower panel of Fig-ure 7 The simulation results offer an explanation for thepublished finding that ascending limit measurement ofolfactory thresholds are less reliable than staircase mea-sured thresholds (Doty McKeown Lee amp Shaman1995)

The upndashdown staircase method gives an average errortwice as large as that of the ML-PEST method althoughboth values are small and therefore similar to eachother The upndashdown staircase method failed by callingfor a stimulus too low on 66 out of 10000 cases whereasthe ML-PEST method had no such failures Althoughthe median number of trials gives the edge to the ML-PEST method on some occasions the ML-PEST methodrequired over 50 trials to reach its stopping criterion

A strength of the ML-PEST method is its ability toconsider alternate hypotheses about the response strat-egy of the observer The ML-PEST method was able tocorrectly detect all 10000 dishonest responders We arecurrently preparing a paper that will go into considerabledetail on how to distinguish malingerers from true anos-mics using a variety of methods Another strength is thatthere is virtually no bias introduced into threshold mea-surements by having the ML-PEST steepness differentfrom the ldquotruerdquo value set in the Monte Carlo observer(Treutwein amp Strasburger 1999) Data generated from aMonte Carlo Weibull function observer can be well fit byvirtually any s-shaped psychometric function althoughthe Weibull will usually fit slightly better We thereforerecommend the ML-PEST method for testing taste andsmell

EXPERIMENT 2TestndashRetest Reliability

In order to assess the stability of the measures ob-tained with the adaptive maximum-likelihood procedurewe tested the subjects on four occasions

MethodSubjects Twenty-seven nonsmoking healthy subjects partici-

pated in the experiment (mean age = 316 plusmn 87 years range =

1342 LINSCHOTEN HARVEY ELLER AND JAFEK

22ndash57 years) The 14 women and 13 men were divided into twogroups Group 1 (ldquoconsecutive rdquo 7 females and 7 males) was testedon 4 consecutive days and Group 2 (ldquodistributed rdquo 7 females and 6males) was tested on Days 1 2 8 and 22

Stimuli Smell thresholds for butyl alcohol were assessed usingthe same range of concentrations as were used with the maximum-likelihood method in Experiment 1 Taste thresholds were measuredusing 20 concentrations of NaCl in quarter-log steps The highestconcentration was 17 M The same maximum-likelihood adaptivemethod as described in Experiment 1 was used in this experiment

Procedure Thresholds were determined with a temporal 2AFCprocedure Taste sensitivity was measured with a whole-mouth sip-and-spit procedure The stimuli and the blanks were presented in30-ml plastic medicine cups containing 5 ml of liquid The sub-jects were requested to take the f irst sample in their mouths swishit around and spit it out then rinse with distilled water and do thesame with the second sample They were asked to choose the sam-ple that had a taste different from water In each of the four sessionstaste thresholds were determined first and smell thresholds last Allfour smell thresholds were determined in the preferred nostril Thesubjects were tested at the same time of each day

ResultsBecause the measurement and specification of testndash

retest reliability is a rather complex topic (Linn 1989)

we have chosen four different ways to demonstrate thereliability of the maximum-likelihood method For thefirst method the four threshold measurements for eachsubject are plotted in Figure 8 The thick line in eachpanel is the mean threshold It is clear from examinationof Figure 8 that most individual thresholds were stableover 4 days (left column of panels) and over 22 days(right column of panels) with a few subjects showingshifts of more than a 1 log unit The mean thresholds forthe two groups are shown in Table 3 A repeated mea-sures ANOVA confirmed the stability of the thresholdsThe effect of testing day was not significant

A second approach is to compute the correlationsamong the four threshold measurements (see eg Mattes1988) Pearson correlation coefficients between thresh-olds obtained on different days are shown in Table 4 Thecorrelations were generally higher for pairs of thresholdsthat were measured close together in time As Mattespointed out correlations are highly influenced by smallchanges in ordinal ranking of the subjects The fact thatthe pairwise correlations became smaller as time sepa-ration between them increased is shown in Figure 9 The

Figure 8 Log threshold concentration as a function of test day for NaCl (upper row) and butanol (lower row) measured on 4 suc-cessive days (left column) and over 22 days (right column) The thin lines connect the four thresholds of individual subjects The thickline is the mean threshold concentration of all subjects

FAST AND ACCURATE THRESHOLDS 1343

solid line is the least squares regression line It indicatesthat the correlation between two thresholds diminishedas the time between them increased The slope of the re-gression line becomes even steeper if the data for 1-dayseparation are removed from the regression indicatingthat the decrease in correlation started with the 2-dayseparation condition

The third method to assess testndashretest reliability isbased on the differences between successive thresholdmeasurements Each threshold was subtracted from theprevious threshold The four threshold measurements re-sulted in three difference measurements The distribu-tions of these differences for NaCl and for butyl alcoholare plotted on the left side of Figure 10 If the measuredthresholds do not systematically increase or decreaseover time we expect that the mean difference would bezero The mean of the distribution for NaCl (upper panelof Figure 10) was 0018 and for butyl alcohol (lowerpanel) was 0002 Neither of these was significantly dif-ferent from zero The standard deviations of these distri-butions were 0413 log units for NaCl and 0422 log unitsfor butyl alcohol

A fourth way to examine testndashretest reliability is bymeans of the spread of the four threshold measures foreach subject A standard deviation of 00 would meanthat all four thresholds were exactly the same value Thestandard deviation of the four measurements was com-puted for each subject The distribution of these standarddeviations for NaCl (upper panel) and butyl alcohol(lower panel) are plotted on the right side of Figure 10The median standard deviation was about 015 log unitsfor both NaCl and butyl alcohol This good testndashretestreliability is consistent with the other three methods ofassessing reliability reported above

DiscussionThe maximum-likelihood method gives estimates of

thresholds that are stable within individuals over timeand that are responsive to individual differences By allfour indicators the testndashretest reliability of both NaCland butyl alcohol thresholds measured by the maximum-likelihood method was very good

GENERAL DISCUSSION

The two experiments and the Monte Carlo simulationsreported here demonstrate that the maximum-likelihoodmethod is a viable alternative to other methods currentlyin use to measure taste and smell detection thresholdsBecause the method attempts to use stimuli that are op-timal (close to the predicted threshold) the confidenceintervals of the measured thresholds are considerablysmaller than those achieved by the other two methods(Experiment 1)

One of the strengths of the maximum-likelihoodmethod is that it provides an explicit way of estimatingthe confidence interval of the threshold a facility notprovided by the other two methods The experimenter isfree to establish a stopping confidence interval to suithisher needs If greater precision is desired it can beachieved at the expense of running more trials This ex-plicit tradeoff between precision and number of trials isnot possible with the other two psychophysical methodsOne could of course use a hybrid combination of meth-ods an upndashdown staircase method for generating the nextstimulus to use combined with a maximum-likelihoodmethod for estimating the threshold and its confidenceinterval In effect we used this approach after the datafrom Experiment 1 were collected by reanalyzing them

Table 3Mean NaCl and Butyl Alcohol Thresholds (and Standard Errors) From Experiment 2

Day 1 Day 2 Day 3 Day 4

M SE M SE M SE M SE

NaClConsecutive (Group 1) 262 013 267 011 282 013 263 011Distributed (Group 2) 290 019 283 009 283 011 277 010

Butyl AlcoholConsecutive (Group 1) 319 014 317 015 325 012 310 013Distributed (Group 2) 300 016 310 014 309 010 309 011

Table 4Pearson Correlation Coefficients Between Thresholds

Paired Sessions

1ndash2 1ndash3 1ndash4 2ndash3 2ndash4 3 ndash 4

NaClConsecutive (Group 1) 46 66 66 44 75 66Distributed (Group 2) 58 72 40 55 20 51Butyl alcoholConsecutive (Group 1) 86 83 86 67 71 76Distributed (Group 2) 26 68 56 51 21 65

p lt 05 p lt 01 p lt 001

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 12: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

FAST AND ACCURATE THRESHOLDS 1341

telling the truth The other two methods do not have anexplicit way of dealing with this situation It is of inter-est to learn how all three methods behave under thesecircumstances

The results of this simulation are presented in Table 2The ML-PEST method successfully measured the thresh-old and explicitly detected the lying condition in all10000 cases and was therefore able to measure thethreshold of the underlying psychometric function witha small median error The upndashdown staircase method andthe ascending method of limits reached the highest con-centration and tried to go higher resulting in 9824 fail-ures for the staircase method and 8648 failures for themethod of limits If we consider a failure as a successfuldetection of the lying behavior the upndashdown staircasemethod detected 98 of the cases and the ascendingmethod of limits detected 86 of the cases These twomethods required about the same number of trials (seeTable 2)

Simulation 4It is important in clinical testing that people with ele-

vated thresholds be easily detected One of the subjects inExperiment 1 f irst had smell tested by the upndashdownmethod and exhibited a normal threshold Later the samenostril was tested using the maximum-likelihood methodand no threshold could be measured because as it turnedout he was anosmic on that side The fact that the upndashdown method did not detect the anosmia is of concern

We therefore repeated the Monte Carlo simulations of10000 subjects using the same starting conditions andthe same test conditions as those in Simulation 1 withone change Each simulated subject generated a randomresponse as would be the case if a person had no sensorysensitivity The ideal behavior of a testing method wouldbe to indicate that the threshold was at the highest valuethat is possible with that method or to have some otherexplicit way of indicating that the subject is respondingrandomly

None of the three methods performed very well in de-tecting the random responding The median errors andnumber of trials are given in Table 2 The upndashdown stair-case method did the poorest because it gives a medianerror of 93 This rather low error occurs because on87 of the cases the stopping criterion of 5 reversalswas satisfied before the method reached the upper limitof the stimulus concentrations The ML-PEST methodfailed on 11 of the cases but overall the thresholdswere estimated to be very high Unfortunately it re-quired a great number of trials to achieve this perfor-mance The ascending method of limits reached theupper limit of stimulus concentration on 66 of thecases and also gave the highest values for the threshold

Since in Monte Carlo simulations the distribution ofthe actual thresholds is known we can apply an SDT analy-sis (Green amp Swets 19661974 Macmillan amp Creelman1991) to compare the distribution of estimated thresh-olds made by each method under random respondingwith the true distribution Using the means and standard

deviations of the above distributions we computed thesignal detection measures of sensitivity da and accuracythe area under the ROC curve Az (Harvey 1992 Simp-son amp Fitter 1973) The ML-PEST method had the high-est sensitivity and accuracy (da = 227 Az = 95) The as-cending method was next best (da = 199 Az = 92) andthe upndashdown staircase method was the least sensitive(da = 150 Az = 86)

DiscussionThe results of the simulations lead to the conclusion that

both the upndashdown staircase method and the maximum-likelihood method have considerable strengths for use inmeasuring taste and smell thresholds The ascendingmethod of limits should not be used if one is interestedin obtaining an accurate estimate of the threshold Thesethreshold estimates covary with the number of trialstaken to stop as is illustrated in the lower panel of Fig-ure 7 The simulation results offer an explanation for thepublished finding that ascending limit measurement ofolfactory thresholds are less reliable than staircase mea-sured thresholds (Doty McKeown Lee amp Shaman1995)

The upndashdown staircase method gives an average errortwice as large as that of the ML-PEST method althoughboth values are small and therefore similar to eachother The upndashdown staircase method failed by callingfor a stimulus too low on 66 out of 10000 cases whereasthe ML-PEST method had no such failures Althoughthe median number of trials gives the edge to the ML-PEST method on some occasions the ML-PEST methodrequired over 50 trials to reach its stopping criterion

A strength of the ML-PEST method is its ability toconsider alternate hypotheses about the response strat-egy of the observer The ML-PEST method was able tocorrectly detect all 10000 dishonest responders We arecurrently preparing a paper that will go into considerabledetail on how to distinguish malingerers from true anos-mics using a variety of methods Another strength is thatthere is virtually no bias introduced into threshold mea-surements by having the ML-PEST steepness differentfrom the ldquotruerdquo value set in the Monte Carlo observer(Treutwein amp Strasburger 1999) Data generated from aMonte Carlo Weibull function observer can be well fit byvirtually any s-shaped psychometric function althoughthe Weibull will usually fit slightly better We thereforerecommend the ML-PEST method for testing taste andsmell

EXPERIMENT 2TestndashRetest Reliability

In order to assess the stability of the measures ob-tained with the adaptive maximum-likelihood procedurewe tested the subjects on four occasions

MethodSubjects Twenty-seven nonsmoking healthy subjects partici-

pated in the experiment (mean age = 316 plusmn 87 years range =

1342 LINSCHOTEN HARVEY ELLER AND JAFEK

22ndash57 years) The 14 women and 13 men were divided into twogroups Group 1 (ldquoconsecutive rdquo 7 females and 7 males) was testedon 4 consecutive days and Group 2 (ldquodistributed rdquo 7 females and 6males) was tested on Days 1 2 8 and 22

Stimuli Smell thresholds for butyl alcohol were assessed usingthe same range of concentrations as were used with the maximum-likelihood method in Experiment 1 Taste thresholds were measuredusing 20 concentrations of NaCl in quarter-log steps The highestconcentration was 17 M The same maximum-likelihood adaptivemethod as described in Experiment 1 was used in this experiment

Procedure Thresholds were determined with a temporal 2AFCprocedure Taste sensitivity was measured with a whole-mouth sip-and-spit procedure The stimuli and the blanks were presented in30-ml plastic medicine cups containing 5 ml of liquid The sub-jects were requested to take the f irst sample in their mouths swishit around and spit it out then rinse with distilled water and do thesame with the second sample They were asked to choose the sam-ple that had a taste different from water In each of the four sessionstaste thresholds were determined first and smell thresholds last Allfour smell thresholds were determined in the preferred nostril Thesubjects were tested at the same time of each day

ResultsBecause the measurement and specification of testndash

retest reliability is a rather complex topic (Linn 1989)

we have chosen four different ways to demonstrate thereliability of the maximum-likelihood method For thefirst method the four threshold measurements for eachsubject are plotted in Figure 8 The thick line in eachpanel is the mean threshold It is clear from examinationof Figure 8 that most individual thresholds were stableover 4 days (left column of panels) and over 22 days(right column of panels) with a few subjects showingshifts of more than a 1 log unit The mean thresholds forthe two groups are shown in Table 3 A repeated mea-sures ANOVA confirmed the stability of the thresholdsThe effect of testing day was not significant

A second approach is to compute the correlationsamong the four threshold measurements (see eg Mattes1988) Pearson correlation coefficients between thresh-olds obtained on different days are shown in Table 4 Thecorrelations were generally higher for pairs of thresholdsthat were measured close together in time As Mattespointed out correlations are highly influenced by smallchanges in ordinal ranking of the subjects The fact thatthe pairwise correlations became smaller as time sepa-ration between them increased is shown in Figure 9 The

Figure 8 Log threshold concentration as a function of test day for NaCl (upper row) and butanol (lower row) measured on 4 suc-cessive days (left column) and over 22 days (right column) The thin lines connect the four thresholds of individual subjects The thickline is the mean threshold concentration of all subjects

FAST AND ACCURATE THRESHOLDS 1343

solid line is the least squares regression line It indicatesthat the correlation between two thresholds diminishedas the time between them increased The slope of the re-gression line becomes even steeper if the data for 1-dayseparation are removed from the regression indicatingthat the decrease in correlation started with the 2-dayseparation condition

The third method to assess testndashretest reliability isbased on the differences between successive thresholdmeasurements Each threshold was subtracted from theprevious threshold The four threshold measurements re-sulted in three difference measurements The distribu-tions of these differences for NaCl and for butyl alcoholare plotted on the left side of Figure 10 If the measuredthresholds do not systematically increase or decreaseover time we expect that the mean difference would bezero The mean of the distribution for NaCl (upper panelof Figure 10) was 0018 and for butyl alcohol (lowerpanel) was 0002 Neither of these was significantly dif-ferent from zero The standard deviations of these distri-butions were 0413 log units for NaCl and 0422 log unitsfor butyl alcohol

A fourth way to examine testndashretest reliability is bymeans of the spread of the four threshold measures foreach subject A standard deviation of 00 would meanthat all four thresholds were exactly the same value Thestandard deviation of the four measurements was com-puted for each subject The distribution of these standarddeviations for NaCl (upper panel) and butyl alcohol(lower panel) are plotted on the right side of Figure 10The median standard deviation was about 015 log unitsfor both NaCl and butyl alcohol This good testndashretestreliability is consistent with the other three methods ofassessing reliability reported above

DiscussionThe maximum-likelihood method gives estimates of

thresholds that are stable within individuals over timeand that are responsive to individual differences By allfour indicators the testndashretest reliability of both NaCland butyl alcohol thresholds measured by the maximum-likelihood method was very good

GENERAL DISCUSSION

The two experiments and the Monte Carlo simulationsreported here demonstrate that the maximum-likelihoodmethod is a viable alternative to other methods currentlyin use to measure taste and smell detection thresholdsBecause the method attempts to use stimuli that are op-timal (close to the predicted threshold) the confidenceintervals of the measured thresholds are considerablysmaller than those achieved by the other two methods(Experiment 1)

One of the strengths of the maximum-likelihoodmethod is that it provides an explicit way of estimatingthe confidence interval of the threshold a facility notprovided by the other two methods The experimenter isfree to establish a stopping confidence interval to suithisher needs If greater precision is desired it can beachieved at the expense of running more trials This ex-plicit tradeoff between precision and number of trials isnot possible with the other two psychophysical methodsOne could of course use a hybrid combination of meth-ods an upndashdown staircase method for generating the nextstimulus to use combined with a maximum-likelihoodmethod for estimating the threshold and its confidenceinterval In effect we used this approach after the datafrom Experiment 1 were collected by reanalyzing them

Table 3Mean NaCl and Butyl Alcohol Thresholds (and Standard Errors) From Experiment 2

Day 1 Day 2 Day 3 Day 4

M SE M SE M SE M SE

NaClConsecutive (Group 1) 262 013 267 011 282 013 263 011Distributed (Group 2) 290 019 283 009 283 011 277 010

Butyl AlcoholConsecutive (Group 1) 319 014 317 015 325 012 310 013Distributed (Group 2) 300 016 310 014 309 010 309 011

Table 4Pearson Correlation Coefficients Between Thresholds

Paired Sessions

1ndash2 1ndash3 1ndash4 2ndash3 2ndash4 3 ndash 4

NaClConsecutive (Group 1) 46 66 66 44 75 66Distributed (Group 2) 58 72 40 55 20 51Butyl alcoholConsecutive (Group 1) 86 83 86 67 71 76Distributed (Group 2) 26 68 56 51 21 65

p lt 05 p lt 01 p lt 001

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 13: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

1342 LINSCHOTEN HARVEY ELLER AND JAFEK

22ndash57 years) The 14 women and 13 men were divided into twogroups Group 1 (ldquoconsecutive rdquo 7 females and 7 males) was testedon 4 consecutive days and Group 2 (ldquodistributed rdquo 7 females and 6males) was tested on Days 1 2 8 and 22

Stimuli Smell thresholds for butyl alcohol were assessed usingthe same range of concentrations as were used with the maximum-likelihood method in Experiment 1 Taste thresholds were measuredusing 20 concentrations of NaCl in quarter-log steps The highestconcentration was 17 M The same maximum-likelihood adaptivemethod as described in Experiment 1 was used in this experiment

Procedure Thresholds were determined with a temporal 2AFCprocedure Taste sensitivity was measured with a whole-mouth sip-and-spit procedure The stimuli and the blanks were presented in30-ml plastic medicine cups containing 5 ml of liquid The sub-jects were requested to take the f irst sample in their mouths swishit around and spit it out then rinse with distilled water and do thesame with the second sample They were asked to choose the sam-ple that had a taste different from water In each of the four sessionstaste thresholds were determined first and smell thresholds last Allfour smell thresholds were determined in the preferred nostril Thesubjects were tested at the same time of each day

ResultsBecause the measurement and specification of testndash

retest reliability is a rather complex topic (Linn 1989)

we have chosen four different ways to demonstrate thereliability of the maximum-likelihood method For thefirst method the four threshold measurements for eachsubject are plotted in Figure 8 The thick line in eachpanel is the mean threshold It is clear from examinationof Figure 8 that most individual thresholds were stableover 4 days (left column of panels) and over 22 days(right column of panels) with a few subjects showingshifts of more than a 1 log unit The mean thresholds forthe two groups are shown in Table 3 A repeated mea-sures ANOVA confirmed the stability of the thresholdsThe effect of testing day was not significant

A second approach is to compute the correlationsamong the four threshold measurements (see eg Mattes1988) Pearson correlation coefficients between thresh-olds obtained on different days are shown in Table 4 Thecorrelations were generally higher for pairs of thresholdsthat were measured close together in time As Mattespointed out correlations are highly influenced by smallchanges in ordinal ranking of the subjects The fact thatthe pairwise correlations became smaller as time sepa-ration between them increased is shown in Figure 9 The

Figure 8 Log threshold concentration as a function of test day for NaCl (upper row) and butanol (lower row) measured on 4 suc-cessive days (left column) and over 22 days (right column) The thin lines connect the four thresholds of individual subjects The thickline is the mean threshold concentration of all subjects

FAST AND ACCURATE THRESHOLDS 1343

solid line is the least squares regression line It indicatesthat the correlation between two thresholds diminishedas the time between them increased The slope of the re-gression line becomes even steeper if the data for 1-dayseparation are removed from the regression indicatingthat the decrease in correlation started with the 2-dayseparation condition

The third method to assess testndashretest reliability isbased on the differences between successive thresholdmeasurements Each threshold was subtracted from theprevious threshold The four threshold measurements re-sulted in three difference measurements The distribu-tions of these differences for NaCl and for butyl alcoholare plotted on the left side of Figure 10 If the measuredthresholds do not systematically increase or decreaseover time we expect that the mean difference would bezero The mean of the distribution for NaCl (upper panelof Figure 10) was 0018 and for butyl alcohol (lowerpanel) was 0002 Neither of these was significantly dif-ferent from zero The standard deviations of these distri-butions were 0413 log units for NaCl and 0422 log unitsfor butyl alcohol

A fourth way to examine testndashretest reliability is bymeans of the spread of the four threshold measures foreach subject A standard deviation of 00 would meanthat all four thresholds were exactly the same value Thestandard deviation of the four measurements was com-puted for each subject The distribution of these standarddeviations for NaCl (upper panel) and butyl alcohol(lower panel) are plotted on the right side of Figure 10The median standard deviation was about 015 log unitsfor both NaCl and butyl alcohol This good testndashretestreliability is consistent with the other three methods ofassessing reliability reported above

DiscussionThe maximum-likelihood method gives estimates of

thresholds that are stable within individuals over timeand that are responsive to individual differences By allfour indicators the testndashretest reliability of both NaCland butyl alcohol thresholds measured by the maximum-likelihood method was very good

GENERAL DISCUSSION

The two experiments and the Monte Carlo simulationsreported here demonstrate that the maximum-likelihoodmethod is a viable alternative to other methods currentlyin use to measure taste and smell detection thresholdsBecause the method attempts to use stimuli that are op-timal (close to the predicted threshold) the confidenceintervals of the measured thresholds are considerablysmaller than those achieved by the other two methods(Experiment 1)

One of the strengths of the maximum-likelihoodmethod is that it provides an explicit way of estimatingthe confidence interval of the threshold a facility notprovided by the other two methods The experimenter isfree to establish a stopping confidence interval to suithisher needs If greater precision is desired it can beachieved at the expense of running more trials This ex-plicit tradeoff between precision and number of trials isnot possible with the other two psychophysical methodsOne could of course use a hybrid combination of meth-ods an upndashdown staircase method for generating the nextstimulus to use combined with a maximum-likelihoodmethod for estimating the threshold and its confidenceinterval In effect we used this approach after the datafrom Experiment 1 were collected by reanalyzing them

Table 3Mean NaCl and Butyl Alcohol Thresholds (and Standard Errors) From Experiment 2

Day 1 Day 2 Day 3 Day 4

M SE M SE M SE M SE

NaClConsecutive (Group 1) 262 013 267 011 282 013 263 011Distributed (Group 2) 290 019 283 009 283 011 277 010

Butyl AlcoholConsecutive (Group 1) 319 014 317 015 325 012 310 013Distributed (Group 2) 300 016 310 014 309 010 309 011

Table 4Pearson Correlation Coefficients Between Thresholds

Paired Sessions

1ndash2 1ndash3 1ndash4 2ndash3 2ndash4 3 ndash 4

NaClConsecutive (Group 1) 46 66 66 44 75 66Distributed (Group 2) 58 72 40 55 20 51Butyl alcoholConsecutive (Group 1) 86 83 86 67 71 76Distributed (Group 2) 26 68 56 51 21 65

p lt 05 p lt 01 p lt 001

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 14: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

FAST AND ACCURATE THRESHOLDS 1343

solid line is the least squares regression line It indicatesthat the correlation between two thresholds diminishedas the time between them increased The slope of the re-gression line becomes even steeper if the data for 1-dayseparation are removed from the regression indicatingthat the decrease in correlation started with the 2-dayseparation condition

The third method to assess testndashretest reliability isbased on the differences between successive thresholdmeasurements Each threshold was subtracted from theprevious threshold The four threshold measurements re-sulted in three difference measurements The distribu-tions of these differences for NaCl and for butyl alcoholare plotted on the left side of Figure 10 If the measuredthresholds do not systematically increase or decreaseover time we expect that the mean difference would bezero The mean of the distribution for NaCl (upper panelof Figure 10) was 0018 and for butyl alcohol (lowerpanel) was 0002 Neither of these was significantly dif-ferent from zero The standard deviations of these distri-butions were 0413 log units for NaCl and 0422 log unitsfor butyl alcohol

A fourth way to examine testndashretest reliability is bymeans of the spread of the four threshold measures foreach subject A standard deviation of 00 would meanthat all four thresholds were exactly the same value Thestandard deviation of the four measurements was com-puted for each subject The distribution of these standarddeviations for NaCl (upper panel) and butyl alcohol(lower panel) are plotted on the right side of Figure 10The median standard deviation was about 015 log unitsfor both NaCl and butyl alcohol This good testndashretestreliability is consistent with the other three methods ofassessing reliability reported above

DiscussionThe maximum-likelihood method gives estimates of

thresholds that are stable within individuals over timeand that are responsive to individual differences By allfour indicators the testndashretest reliability of both NaCland butyl alcohol thresholds measured by the maximum-likelihood method was very good

GENERAL DISCUSSION

The two experiments and the Monte Carlo simulationsreported here demonstrate that the maximum-likelihoodmethod is a viable alternative to other methods currentlyin use to measure taste and smell detection thresholdsBecause the method attempts to use stimuli that are op-timal (close to the predicted threshold) the confidenceintervals of the measured thresholds are considerablysmaller than those achieved by the other two methods(Experiment 1)

One of the strengths of the maximum-likelihoodmethod is that it provides an explicit way of estimatingthe confidence interval of the threshold a facility notprovided by the other two methods The experimenter isfree to establish a stopping confidence interval to suithisher needs If greater precision is desired it can beachieved at the expense of running more trials This ex-plicit tradeoff between precision and number of trials isnot possible with the other two psychophysical methodsOne could of course use a hybrid combination of meth-ods an upndashdown staircase method for generating the nextstimulus to use combined with a maximum-likelihoodmethod for estimating the threshold and its confidenceinterval In effect we used this approach after the datafrom Experiment 1 were collected by reanalyzing them

Table 3Mean NaCl and Butyl Alcohol Thresholds (and Standard Errors) From Experiment 2

Day 1 Day 2 Day 3 Day 4

M SE M SE M SE M SE

NaClConsecutive (Group 1) 262 013 267 011 282 013 263 011Distributed (Group 2) 290 019 283 009 283 011 277 010

Butyl AlcoholConsecutive (Group 1) 319 014 317 015 325 012 310 013Distributed (Group 2) 300 016 310 014 309 010 309 011

Table 4Pearson Correlation Coefficients Between Thresholds

Paired Sessions

1ndash2 1ndash3 1ndash4 2ndash3 2ndash4 3 ndash 4

NaClConsecutive (Group 1) 46 66 66 44 75 66Distributed (Group 2) 58 72 40 55 20 51Butyl alcoholConsecutive (Group 1) 86 83 86 67 71 76Distributed (Group 2) 26 68 56 51 21 65

p lt 05 p lt 01 p lt 001

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 15: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

1344 LINSCHOTEN HARVEY ELLER AND JAFEK

with the maximum-likelihood technique in order to as-sess the confidence interval of the resulting threshold

The Monte Carlo simulations allow the analysis of theideal behavior of the three psychophysical methods Onemajor conclusion is that the ascending staircase methodgives threshold measurements that are severely biasedThis bias is a function of the number of trials carried outbefore the stopping criterion of 5 correct in a row withone stimulus When the number of trials was small (lessthan approximately 15) the threshold estimate was toolow when the number of trials was greater than 15 thethreshold estimate was too high (see Figure 7) This

characteristic is due to the fact that the stimulus concen-tration used can only increase as trials progress Themaximum-likelihood method has the least amount ofbias although it is always slightly negative (ie thethreshold estimate is on the average slightly lower by002 log units than the true value)

In clinical testing especially for cases that involve thelegal system it is important to be able to detect malin-gerers One of the strengths of the maximum-likelihoodmethod is that it explicitly tests hypotheses Normallythe hypothesis is that the subject is responding truthfullyand that the threshold has a certain value In the present

Figure 9 Testndashretest reliability as indicated by correlation coefficients between allpairs of measurements plotted as a function of the number of days separating themThe solid line is the least squares linear regression through the data

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 16: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

FAST AND ACCURATE THRESHOLDS 1345

implementation of the maximum-likelihood method thecomputer routines maintain three hypotheses about theresponse strategy of the subject that the subject is re-sponding truthfully that the subject is lying on each trial(ie always choosing what heshe believes is the wrongalternative in the 2AFC procedure) and that the subjectis responding randomly (ie not making use of any sen-sory information) The ability to detect a malingerermdashone who has sensory sensitivity but tries to respond ran-domlymdashis based on the fact that it is difficult for humansto generate truly random sequences (Bar-Hillel amp Wa-genaar 1993) A subject who can taste or smell the stim-ulus and therefore knows in which interval it occurstends to respond more wrongly than they would bychance alone In this case the hypothesis that the subjectis lying will have a higher likelihood than the hypothesisthat the subject is telling the truth We are currentlyworking to refine this ability of the maximum-likelihoodmethod because it holds great promise The other twopsychophysical procedures do not easily distinguish be-tween a malingerer and a person with no sensory sensi-tivity and have no explicit way to detect malingerers

Finally in Experiment 2 we demonstrated that themaximum-likelihood method gives estimates of thresholds

that are stable over time This stability is actually basedon two aspects of our testing procedure The maximum-likelihood method gives the smallest confidence inter-vals of the three methods tested and thus contributes theleast amount of random or systematic error variance tothe measured thresholds The second source of reliabil-ity is the experimental paradigm The 2AFC procedureminimizes the effects of response bias (eg Green ampSwets 19661974) and thus minimizes error variance inthe threshold measurements that would be introduced byvariance in response bias Actually it would be desirableto use more than just two forced-choice alternatives be-cause the ML-PEST methods converges more rapidly onthe threshold value with more alternatives (Manny ampKlein 1985 Schlauch amp Rose 1990) The time requiredto present chemosensory stimuli however usually pre-cludes using more than two alternatives

Testndashretest reliability is not achieved at the expense ofprecision Our measured thresholds show an approxi-mately normal distribution with a standard deviation ofabout 04 log units The maximum-likelihood method isable to estimate thresholds that lie on a continuum eventhough a small number of discrete stimulus concentra-tions are available for testing Although a computer is re-

Figure 10 Distribution of differences between successive thresholds (left) and of the standard deviation of the four thresholdmeasurements for the 27 subjects (right) for NaCl (upper panel) and butyl alcohol (lower panel)

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 17: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

1346 LINSCHOTEN HARVEY ELLER AND JAFEK

quired to run ML-PEST software the widespread avail-ability of inexpensive desktop or laptop computersmakes this requirement much less of a barrier than in thepast

REFERENCES

An l iker J A Ba r t oshuk L Fer r is A N amp Hooks L D (1991)Childrenrsquos food preferences and genetic sensitivity to the bitter tastesof 6-n-propythiouracil (PROP) American Journal of Clinical Nutri-tion 54 316-320

Apt er A J Gent J F amp Fr an k M E (1999) Fluctuating olfactorysensitivity and distorted odor perception in allergic rhinitis Archivesof Otolaryngology Head amp Neck Surgery 125 1005-1010

Ba r -Hil l el M amp Wagena ar W A (1993) The perception of ran-domness In C L Gideon Keren (Ed) A handbook for data analysisin the behavioral sciences Methodological issues (pp 369-393) Hillsdale NJ Erlbaum

Ber kson J (1951) Why I prefer logits to probits Biometrics 7 327-339Ber kson J (1953) A statistically precise and relatively simple method

of estimating the bio-assay with quantal response based on the lo-gistic function Journal of the American Statistical Association 48565-600

Ber kson J (1955) Maximum-likelihood and minimum chi-square es-timates of the logistic function Journal of the American StatisticalAssociation 50 130-162

Ca in W S Gent J F Cat a l an ot t o F A amp Goodspeed R B(1983) Clinical evaluation of olfaction American Journal of Oto-laryngology 4 252-256

Comet t o-Mu ntildeiz J E amp Cain W S (1995) Relative sensitivity ofthe ocular trigeminal nasal trigeminal and olfactory systems to air-borne chemicals Chemical Senses 20 191-198

Cor n sweet T N (1962) The staircase method in psychophysics American Journal of Psychology 75 485-491

Cowar t B J (1989) Relationships between taste and smell across theadult life span In C Murphy W S Cain amp D M Hegsted (Eds)Nutrition and the chemical senses in aging Recent advances and cur-rent research needs (Annals of the New York Academy of SciencesVol 561 pp 39-55) New York New York Academy of Sciences

Cowa r t B J Yokomu ka i Y amp Beau ch a mp G K (1994) Bittertaste in aging Compound-specific decline in sensitivity Physiologyamp Behavior 56 1237-1241

Deems D A Dot y R L Set t l e R G Moor e-Gil l on VSha ma n P Mest er A F Kimmel ma n C P Br igh t man V J ampSnow J B Jr (1991) Smell and taste disorders a study of 750 pa-tients from the University of Pennsylvania Smell and Taste CenterArchives of Otolaryngology Head amp Neck Surgery 117 519-528

Dot y R L McKeown D A Lee W W amp Sha ma n P (1995) Astudy of the testndashretest reliability of ten olfactory tests ChemicalSenses 20 645-656

Dot y R L Sn yder P J Huggins G R amp Lowr y L D (1981) En-docrine cardiovascular and psychological correlates of olfactorysensitivity changes during the human menstrual cycle Journal ofComparative amp Physiological Psychology 95 45-60

Dr ewn owski A Hender son S A amp Shor e A B (1997) Geneticsensitivity to 6-n-propythiouracil (PROP) and hedonic responses tobitter and sweet tastes Chemical Senses 22 27-37

Du ffy V B Cain W S amp Fer r is A M (1999) Measurement ofsensitivity to olfactory flavor Application in a study of aging anddentures Chemical Senses 24 671-677

Fechn er G T (1860) Elemente der Psychophysik Leipzig Breitkopfund Haumlrtel

Gagnon P Mer gl er D amp Lapa r e S (1994) Olfactory adaptationthreshold shift and recovery at low levels of exposure to methylisobutyl ketone (MIBK) Neurotoxicology 15 637-642

Ga r cia -Per ez M A (1998) Forced-choice staircases with fixed stepsizes Asymptotic and small-sample properties Vision Research 381861-1881

Gr een D M amp Swet s J A (1974) Signal detection theory andpsychophysics Huntington NY Robert E Krieger (Original workpublished 1966)

Gr ushka M Sessl e B J amp Howl ey T P (1986) Psychophysica levidence of taste dysfunction in burning mouth syndrome ChemicalSenses 11 485-498

Guil for d J P (1936) Psychometric methods New York McGraw-HillGu il for d J P (1954) Psychometric methods (2nd ed) New York

McGraw-HillHa l l J L (1968) Maximum-likelihood sequential procedures for es-

timation of psychometric functions Journal of the Acoustical Soci-ety of America 44 370

Ha l l J L (1981) Hybrid adaptive procedure for estimation of psy-chometric functions Journal of the Acoustical Society of America69 1763-1769

Har vey L O Jr (1986) Efficient estimation of sensory thresholdsBehavior Research Methods Instruments amp Computers 18 623-632

Ha r vey L O Jr (1992) The critical operating characteristic and theevaluation of expert judgment Organizational Behavior amp HumanDecision Processes 53 229-251

Ha r vey L O Jr (1997) Efficient estimation of sensory thresholdswith ML-PEST Spatial Vision 11 121-128

Hays W L (1963) Statistics for psychologist s (1st ed) New YorkHolt Rinehart amp Winston

Kr an t z D H (1969) Threshold theories of signal detection Psycho-logical Review 76 308-324

Leh r n er J P Br uuml cke T Da l -Bia n co P Gat t er er G ampKr yspin -Exner I (1997) Olfactory functions in Parkinsonrsquos dis-ease and Alzheimerrsquos disease Chemical Senses 22 105-110

Lehr ner J P Kr yspin -Exner I amp Vet t er N (1995) Higher ol-factory threshold and decreased odor identif ication ability in HIV-infected persons Chemical Senses 20 325-328

Linn R L (Ed) (1989) Educational measurement (3rd ed) NewYork Macmillan

Linschot en M R I amp Kr oeze J H A (1991) Spatial summationin taste NaCl thresholds and stimulated area on the anterior humantongue Chemical Senses 16 219-224

Linschot en M R I amp Kr oez e J H A (1992) Bilateral taste stim-ulation Spatial summation with weak NaCl stimuli ChemicalSenses 17 53-59

Linschot en M R I amp Kr oeze J H A (1994) Ipsi- and bilateral in-teractions in taste Perception amp Psychophysics 55 387-393

Macmil l a n N A amp Cr eel man C D (1991) Detection theory Auserrsquos guide Cambridge Cambridge University Press

Man ny R E amp Kl ein S A (1985) A three-alternative tracking par-adigm to measure vernier acuity of older infants Vision Research25 1245-1252

Mat t es R D (1988) Reliability of psychophysical measures of gus-tatory function Perception amp Psychophysics 43 107-114

Mol l B Kl imek L Egger s G amp Man n W (1998) Comparisonof olfactory function in patients with seasonal and perennial allergicrhinitis Allergy 53 297-301

Mur ph y C amp Cain W S (1986) Odor identification The blind arebetter Physiology amp Behavior 37 177-180

Mur ph y C Nor din S de Wijk R A Ca in W S amp Pol ich J(1994) Olfactory-evoked potentials Assessment of young and el-derly and comparison to psychophysical threshold Chemical Senses19 47-56

Nor din S Mur ph y C Davidson T M Quin onez C Ja l oway-ski A A amp El l ison D W (1996) Prevalence and assessment ofqualitative olfactory dysfunction in different age groups Laryngo-scope 106 739-744

OrsquoMah on y M Gar dn er L Lon g D Heint z C Thompson Bamp Davies M (1979) Salt taste detection An R-index approach tosignal-detection measurements Perception 8 497-506

Pent l a nd A (1980) Maximum likelihood estimation The best PESTPerception amp Psychophysics 28 377-379

Pier ce J D Jr Dot y R L amp Amoor e J E (1996) Analysis of po-sition of trial sequence and type of diluent on the detection thresholdfor phenyl ethyl alcohol using a single staircase method Perceptualamp Motor Skills 82 451-458

Rosen bl at t M R Ol mst ead R E Iwa mot o-Sch aa pp P N ampJa r vik M E (1998) Olfactory thresholds for nicotine and mentholin smokers (abstinent and nonabstinent) and nonsmokers Physiol-ogy amp Behavior 65 575-579

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)

Page 18: Fast and accurate measurement of taste and smell ...psych-lharvey/Publications/(2001) Linschoten_et… · Since the publication of Elemente der Psychophysik (Fechner, 1860), considerable

FAST AND ACCURATE THRESHOLDS 1347

SAS Inst it u t e (1998a) StatView StatView reference Cary NC SASInstitute

SAS In st it u t e (1998b) StatView Using StatView Cary NC SAS In-stitute

Schl auch R S amp Rose R M (1990) Two- three- and four-intervalforced-choice staircase procedures Estimator bias and efficiencyJournal of the Acoustical Society of America 88 732-740

Simpson A J amp Fit t er M J (1973) What is the best index of de-tectability Psychological Bulletin 80 481-488

St even s J C (1995) Detection of heteroquality taste mixtures Per-ception amp Psychophysics 57 18-26

Swet s J A (1961) Is there a sensory threshold Science 134 168-177Swet s J A (1986a) Form of empirical ROCrsquos in discrimination and

diagnostic tasks Implications for theory and measurement of per-formance Psychological Bulletin 99 181-198

Swet s J A (1986b) Indices of discrimination or diagnostic accuracyTheir ROCrsquos and implied models Psychological Bulletin 99 100-117

Swet s J A (1996) Signal detection theory and ROC analysis in psy-chology and diagnostics Collected papers Mahwah NJ Erlbaum

Swet s J A Ta nn er W P Jr amp Bir dsal l T G (1961) Decisionprocesses in perception Psychological Review 68 301-340

Tayl or M M amp Cr eel ma n C D (1967) PEST Efficient estimateson probability functions Journal of the Acoustical Society of Amer-ica 41 782-787

Tr eut wein B amp St r asbur ger H (1999) Fitting the psychometricfunction Perception amp Psychophysics 61 87-106

Ur ban F M (1908) The application of statistical methods to prob-lems of psychophysics Philadelphia Psychological Clinic Press

Wat son A B amp Pel l i D G (1983) QUEST A Bayesian adaptivepsychometric method Perception amp Psychophysic s 33 113-120

Weif f enbach J M Baum B J amp Bu r gh auser R (1982) Tastethresholds Quality specific variation with human aging Journal ofGerontology 37 372-377

Weif fenbach J M Sch wa r t z L K At kinson J C amp Fox P C(1995) Taste performance in Sjoumlgrenrsquos syndrome Physiology amp Be-havior 57 89-96

Wet h er il l G B amp Levit t H (1965) Sequential estimation ofpoints on a psychometric function British Journal of Mathematicalamp Statistical Psychology 18 1-10

(Manuscript received November 11 1999revision accepted for publication March 4 2001)