whatever happened to information theory in psychology? · whatever happened to information theory...

6

Click here to load reader

Upload: truongkhue

Post on 17-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Whatever Happened to Information Theory in Psychology? · Whatever Happened to Information Theory in Psychology? ... communication theory to information theory, ... Introduction of

Whatever Happened to Information Theory in Psychology?

R. Duncan LuceUniversity of California, Irvine

Although Shannon’s information theory is alive and well in a number of fields, after aninitial fad in psychology during the 1950s and 1960s it no longer is much of a factor,beyond the wordbit, in psychological theory. The author discusses what seems to him(and others) to be the root causes of an actual incompatibility between informationtheory and the psychological phenomena to which it has been applied.

Claude Shannon, the creator of informationtheory, or communication theory as he preferredto call it, died on February 24, 2001, at age 84.So, I would like to dedicate this brief piece tohis memory and in particular to recall his sem-inal contribution “A Mathematical Theory ofCommunication,” which was published in twoparts in theBell System Technical Journal in1948 and rendered more accessible in the shortmonograph by Shannon and Weaver (1949).

Let me begin by saying that information the-ory is alive and well in biology, engineering,physics, and statistics, although my conclusionis that for quite good reasons it has had littlelong-range impact in psychology. One rarelysees Shannon’s information theory in contem-porary psychology articles except to the extentof the late John W. Tukey’s termbit, which isnow a permanent word of our vocabulary. If welook at the table of contents of J. Skilling’s(1989)Maximum Entropy and Bayesian Meth-ods, we find the pattern of chapters on applica-tions shown in Table 1. You will note none arein psychology.

Because it is doubtful that many young psy-chologists are learning the subject, a few wordsare necessary to set the stage. As mathematicalexpositor extraordinaire Keith Devlin (2001, p.21) stated: “Shannon’s theory does not deal

with ‘information’ as that word is generallyunderstood. Instead, it deals with data—the rawmaterial out of which information is obtained.”Now, that makes it sound akin to what wenormally think to be the role of statistics, whichis correct. It begins with an abstract, finite set ofelements and a probability distribution over it.Let the elements of the set be identified with thefirst m integers and letp(i), wherei � 1, . . . ,m,be the probabilities that are assumed to cover allpossibilities, that is,¥i�1

m p(i) � 1. Uncertaintyis a summary number that is a function of theprobabilities; that is,U({ p(i)}) � U( p(1), . . . ,p(m)). What function?

Shannon’s Measure of Information

To get at that, Shannon imposed a number ofplausible properties that he believed such anumber should satisfy. Others subsequentlyproposed alternatives, among them Acze´l andDaroczy (1975); Acze´l, Forte, and Ng (1974);and Luce (1960). Probably the best alternativewas that offered by Acze´l et al. (1974), which,with less than total precision, was as follows:

1. The labeling of the elements is totallyimmaterial; all that counts is the set ofmprobabilities.

2. For m � 2, U(12, 1

2) � 1. This defines the

unit of uncertainty, the bit.3. limpn0U( p, 1 � p) � 0.4. U( p1, . . . , pm, 0) � U( p1, . . . , pm).5. If P andQ are two distributions andP � Q

denotes their joint distribution,U(P � Q) �U(P) � U(Q), with � holding if P and Q areindependent.

The mathematical conclusion is that

This article was prepared at the instigation of Alan Bo-neau as a contribution to a Division 1 session at the 2001Annual Meeting of the American Psychological Associationin San Francisco. I have benefited from interchanges on thetopic with Peter Killeen and Donald Laming.

Correspondence concerning this article should be ad-dressed to R. Duncan Luce, School of Social Sciences,Social Science Plaza 2133, University of California, Irvine,California 92697-5100. E-mail: [email protected]

Review of General Psychology Copyright 2003 by the Educational Publishing Foundation2003, Vol. 7, No. 2, 183–188 1089-2680/03/$12.00 DOI: 10.1037/1089-2680.7.2.183

183

Page 2: Whatever Happened to Information Theory in Psychology? · Whatever Happened to Information Theory in Psychology? ... communication theory to information theory, ... Introduction of

U��p�i��� � ��i � 1

m

p�i� log2 p�i�. (1)

Maximal uncertainty about what will be se-lected occurs when all probabilities are equallylikely, that is, pi � 1/n, in which case U({1/n}) � ln2 n.

Uncertainty measured in this way is thus asingle number associated with a finite probabil-ity distribution. In some physical contexts, U isidentifiable with thermodynamic entropy, andsome authors call it that and use the physicalnotation H rather than U. Other authors use I forinformation.

This measure is to be distinguished sharplyfrom familiar statistics such as the mean andvariance, which are associated not with a prob-ability distribution over a set of elements butwith a random variable that maps the set intonumbers and, in the process, introduces an or-dering of the elements not available in Shan-non’s context.

The role of information transmission is toreduce uncertainty. Suppose {p(i)} is the distri-bution before information is transmitted and p(i,j) denotes the probability of the joint state of ibeing transmitted and j received. The condi-tional probability is defined in the usual way as

p�j�i� �p�i, j�

p�i�� p�i� � 0�.

The posterior uncertainty U({p(j�i)}) can beshown to be the original uncertainty less thetransmitted information; that is, U({p(j�i)}) �U({p(i)}) � U({p(i, j)}). This can be thought ofas the posterior uncertainty.

There clearly is at least a conceptual relationto Bayes’s theorem (which was really workedout in the form we know it by P. S. LaPlace[1812/1820]). In fact, the relation is far morethan just conceptual and has been well devel-oped for a variety of statistical concepts in manypapers and books. I return to this linkage later.

Shannon’s theory then went on to considerthe limitations of channels to transmit informa-tion, for which he defined a concept of channelcapacity and, given a noisy transmission line,the (often quite elaborate) coding necessary toachieve near-to-perfect transmission. As notedearlier, Shannon strongly preferred the termcommunication theory to information theory,but in psychology at least, information becamethe standard term.

Introduction of Information Theory inPsychology

During graduate school at the MassachusettsInstitute of Technology and during a brief stint(1950–1953) at the Research Laboratory forElectronics (RLE), I was surrounded by a flurryof ideas that had matured in research laborato-ries during World War II and shortly thereafterand that have, in fact, all played a significantscientific role since then. They were as follows:information theory, feedback, and cybernetics;networks of various sorts and automata theory;the beginnings of artificial intelligence and ofboth analogue and digital computers; Bayesianstatistics and the theory of signal detectability;Chomsky’s developing theory of linguistics;and game and utility theory. Some psycholo-gists had been close to one or another of thesedevelopments, and so different ones latchedonto different ideas.

A substantial group of psychologists and en-gineers at RLE focused on information theoryand held a very well attended weekly seminaron the topic, which, among its incidental con-sequences, provided me with an education onthe topic. Later in 1954, some of these andrelated developments were reported at the“Conference on the Estimation of InformationFlow,” organized by Henry Quastler and held inMonticello, Illinois. The conference was sum-marized in Quastler’s 1955 edited volume In-formation Theory in Psychology.

Table 1Pattern of Application Topics in Skilling’s (1989)Table of Contents

Topic No. of articles

Thermodynamics and quantum mechanics 5Physical measurement 6Crystallography 5Time series, power spectrum 6Astronomical techniques 3Neural networks 2Statistical fundamentals 17

184 LUCE

Page 3: Whatever Happened to Information Theory in Psychology? · Whatever Happened to Information Theory in Psychology? ... communication theory to information theory, ... Introduction of

Perhaps for psychologists, the two definingarticles to come out of those meetings were thelate William J. McGill’s (1954) “MultivariateInformation Transmission” and George Miller’s(1956) “The Magical Number Seven, Plus orMinus Two: Some Limits on Our Capacity forProcessing Information.” The latter article ad-dressed several phenomena that seem to exhibitinformation capacity limitations, including ab-solute judgments of unidimensional and multi-dimensional stimuli and short-term memory.Comprehensive, but quite different, summariesof the ideas and experiments were given byAttneave (1959) and Garner (1962). Later arti-cles expanded the interest to the relation be-tween mean response times and the uncertaintyof the stimuli to which participants were re-sponding. In early experiments, mean responsetime appeared to grow linearly with uncertainty,but glitches soon became evident. The deepestpush in the response time direction was DonaldLaming’s (1968) subtle Information Theoryof Choice-Reaction Times, although he laterstated: “This idea does not work. While my owndata (1968) might suggest otherwise, there arefurther unpublished results that show it to behopeless” (Laming, 2001, p. 642).

The enthusiasm—nay, faddishness—of thetimes is hard to capture now. Many felt that avery deep truth of the mind had been uncovered.Yet, Shannon was skeptical. He is quoted assaying “ Information theory has perhaps bal-looned to an importance beyond its actual ac-complishments” (as cited in Johnson, 2001).And Myron Tribus (1979, p. 1) wrote: “ In 1961Professor Shannon, in a private conversation,made it quite clear to me that he consideredapplications of his work to problems outside ofcommunication theory to be suspect and he didnot attach fundamental significance to them.”These skeptical views strike me as appropriate.

Why Limited Application in Psychology?

The question remains: Why is informationtheory not very applicable to psychologicalproblems despite apparent similarities of con-cepts? Laming (2001) provided a very detailedcritique—far more specific than this one—of avariety of attempts to use information, in par-ticular the appealing concept of channel capac-ity. He pointed out (p. 639), however, that

Shannon’s way of defining the concept requiresthat not individual signals be transmitted butrather very long strings of them so as to be ridof redundancies. That is rarely possible withinpsychological experiments. This is definitelypart of the answer.

But in my opinion, the most important an-swer lies in the following incompatibility be-tween psychology and information theory. Theelements of choice in information theory areabsolutely neutral and lack any internal struc-ture; the probabilities are on a pure, unstruc-tured set whose elements are functionally inter-changeable. That is fine for a communicationengineer who is totally unconcerned with thesignals communicated over a transmission link;interchanging the encoding matters not at all.By and large, however, the stimuli of psycho-logical experiments are to some degree struc-tured, and so, in a fundamental way, they arenot in any sense interchangeable. If one is doingan absolute judgment experiment of pure tonesthat vary in intensity or frequency, the stimulihave a powerful and relevant metric structure,namely, differences or ratios of intensity andfrequency measures between pairs of stimuli.And that structure has been shown to mattergreatly in the following sense. Substantial se-quential effects exist between a stimulus and atleast the immediately preceding stimulus–re-sponse pair, but with the magnitude of the cor-relation dropping from close to one for smallsignal separation in either decibels or frequencyto about zero for large separations (Green, Luce,& Duncan, 1977; Luce, Green, & Weber, 1976).Similarly, if one does a memory test, one has togo to very great pains to avoid associationsamong the stimuli. Stimulus similarity, al-though still ill understood and under active in-vestigation, is a powerful structural aspect ofpsychology.

Gradually, as the importance of this realitybegan to set in, one saw fewer—although still afew—attempts to understand global psycholog-ical phenomena in simple information theoryterms. Of course, the word information has beenalmost seamlessly transformed into the conceptof “ information-processing models” in whichinformation theory per se plays no role. Theidea of the mind being an information-process-ing network with capacity limitations has stayedwith us, but in far more complex ways than pure

185INFORMATION THEORY

Page 4: Whatever Happened to Information Theory in Psychology? · Whatever Happened to Information Theory in Psychology? ... communication theory to information theory, ... Introduction of

information theory. Much theorizing in cogni-tive psychology is of this type, now being moreor less well augmented by brain imagingtechniques.

Before going on, let me note that this incom-patibility between either a summary measure ofa probability distribution or the distribution it-self and the psychological structure of sets ofrelated stimuli is an issue not only for informa-tion theory. The unstructured elements of prob-ability theory and the structure of psychologicalstimuli have made it very difficult indeed toadd, in a principled fashion, probabilistic as-pects to models of behavior of any complexityat all. The melding, for example, of utility the-ory and randomness has proved to be as elusiveas it is important.

Information Theory and Statistics

As noted earlier, there is a close linkagebetween Shannon’s information measures andstatistics that has been very well explored (Di-amond, 1959; Jaynes, 1979, 1986, 1988; Jus-tice, 1986; Levine & Tribus, 1979; Mathai,1975; Skilling, 1989; Zellner, 1988). As Jaynes(1988, p. 281) remarked, commenting on Zell-ner’s linking the information measure directlyto Bayes’s theorem, “entropy has been a recog-nized part of probability theory since the workof Shannon 40 years ago. . . . But now we seethat there is, after all, a close connection be-tween entropy and Bayes’ theorem.” Laming(2001) also emphasized the tight connection.

But in contrast to their wide use in econom-ics, Bayesian statistics have not yet been a roar-ing success in psychology despite many years ofbeing promoted, notably by Ward Edwards. Formore than 40 years he has run an annual (Feb-ruary) Bayesian conference that, in recent years,has been held at the Sportsman Lodge in StudioCity, California. The major place where Bayes-ian ideas have flowered quite well in psychol-ogy is in decision theory, in which there is quitea natural melding of utility theory ideas withBayesian updating of the probabilities underly-ing subjective expected utility. But as an infer-ential engine in psychology proper, which con-tinues to be haunted by the hypothesis testingparadigm of agriculture and clinical trials,Bayesian methods have so far played very little

role. This may well be unfortunate given theirsuccesses in other fields.

There is one notable exception to what I havejust stated about the incompatibility of psycho-logical structure and information theory. This iswhen the number of stimuli m in informationtheory and the number of hypotheses being con-sidered in a Bayesian analysis is two. In thatcase, the role of stimulus structure largely dis-appears, and one can sometimes get away withtreating the two stimuli as without structurebeyond probabilities of choice. Here Bayesianideas have played a major role in the widelyused theory of signal detectability (Egan, 1975;Green & Swets, 1966; Macmillan & Creelman,1991; Swets, 1964). Recall that one has amongthe variables a probability distribution over sig-nal presentations (i.e., p and 1 � p); one has twoconditional probabilities of response, one foreach of the possible stimuli; and one has payoffsthat can be modeled by the simplest of utilitytheories. The main role of the theory is to pro-vide a decomposition of the process into twoparts, one attributable to inherent sensory prop-erties and the other to decision-making criteria.This is similar in some ways to the informationtheory decomposition. So, it comes as no greatsurprise that people have approached the prob-lem from the Bayesian perspective, in the formof classical signal detectability theory, and fromthat of information theory, and have attemptedto relate the two.

Perhaps the most elaborate effort in this di-rection has been that of Kenneth H. Norwich(1993), who summarized his approach in Infor-mation, Sensation, and Perception. Although hestates his hypothesis quite generally that subjec-tive sensation is basically the measure of uncer-tainty reduction that occurs when the stimulus ispresented, his detailed explorations are largelyconfined to binary situations, wherein his styleclosely resembles that of classical 19th-centuryphysical thermodynamics. A more recent com-parison of signal detection and information the-ories is a manuscript authored by Peter R.Killeen and Thomas J. Taylor (2001) titled Bitsof the ROC: Signal Detection as InformationTransmission. And as I suggested earlier, mostdecision theory in which Bayesian ideas play arole involves binary decisions based on binarydata with distinct sources of information beingdealt with independently. Laming (2001, pp.

186 LUCE

Page 5: Whatever Happened to Information Theory in Psychology? · Whatever Happened to Information Theory in Psychology? ... communication theory to information theory, ... Introduction of

462–463) pointed out, however, that carefulanalyses of the criterion in signal detectionshow systematic shifts depending on previousoutcomes, which rules out any simple use ofinformation or Bayesian theory in this context;indeed, simple probability matching of re-sponses to presentations is far more predictive.

Conclusions

Laming (2001), despite his detailed critiqueof almost all attempts to apply information the-ory in psychology, ended on a surprisingly op-timistic note. He wrote: “ Information theoryprovides, as it were, a ‘non-parametric’ tech-nique for the investigation of all kinds of sys-tems without the need to understand the ma-chinery, to model the brain without modellingthe neural responses” (Laming, 2001, p. 645).

I am not so optimistic. The fact is that gen-eralizations of either Bayesian or informationtheory ideas to situations with more than twohypotheses or stimuli remain unfulfilled, andmy conjecture is that this is not likely to changesoon. This is my brief answer to the question“Whatever happened to information theory inpsychology?”

References

Aczel, J., & Daroczy, Z. (1975). On measures ofinformation and their characterizations. NewYork: Academic Press.

Aczel, J., Forte, B., & Ng, C. T. (1974). Why theShannon and Hartley entropies are “natural.” Ad-vances in Applied Probability, 6, 131–146.

Attneave, F. (1959). Applications of information the-ory to psychology. New York: Henry Holt.

Devlin, K. (2001). Claude Shannon, 1916–2001. Fo-cus: The Newsletter of the Mathematical Associa-tion of America, 21, 20–21.

Diamond, S. (1959). Information and error: An in-troduction to statistical analysis. New York: BasicBooks.

Egan, J. P. (1975). Signal detection theory and ROCanalysis. New York: Academic Press.

Garner, W. R. (1962). Uncertainty and structure aspsychological concepts. New York: Wiley.

Green, D. M., Luce, R. D., & Duncan, J. E. (1977).Variability and sequential effects in magnitudeproduction and estimation of auditory intensity.Perception & Psychophysics, 22, 450–456.

Green, D. M., & Swets, J. (1966). Signal detectiontheory and psychophysics. New York: Wiley.

Jaynes, E. T. (1979). Where do we stand on maxi-mum entropy? In R. D. Levine & M. Tribus (Eds.),The maximum entropy formalism (pp. 15–118).Cambridge, MA: MIT Press.

Jaynes, E. T. (1986). Bayesian methods: An intro-ductory tutorial. In J. H. Justice (Ed.), Maximumentropy and Bayesian methods in applied statistics(pp. 1–25). Cambridge, England: Cambridge Uni-versity Press.

Jaynes, E. T. (1988). Discussion. American Statisti-cian, 42, 280–281.

Johnson, G. (2001, February 27). Claude Shannon,mathematician, dies at 84 [obituary]. New YorkTimes, p. B7.

Justice, J. H. (Ed.). (1986). Maximum entropy andBayesian methods in applied statistics. Cambridge,England: Cambridge University Press.

Killeen, P. R., & Taylor, T. J. (2001). Bits of theROC: Signal detection as information transmis-sion. Unpublished manuscript.

Laming, D. R. J. (1968). Information theory ofchoice-reaction times. New York: AcademicPress.

Laming, D. (2001). Statistical information, uncer-tainty, and Bayes’ theorem: Some applications inexperimental psychology. In S. Benferhat & P.Besnard (Eds.), Symbolic and quantitative ap-proaches to reasoning with uncertainty (pp. 635–646). Berlin: Springer-Verlag.

LaPlace, P. S. (1820). Theorie analytique des proba-bilities [Analytic theory of probability] (2 vols.;3rd ed., with supplements). Paris: Courcier. (Orig-inal published 1812; reprints available from Edi-tions Culture et Civilisation, 115 Avenue GabrielLebron, 1160 Brussels, Belgium)

Levine, R. D., & Tribus, M. (Eds.). (1979). Themaximum entropy formalism. Cambridge, MA:MIT Press.

Luce, R. D. (1960). The theory of selective informa-tion and some of its behavioral application. InR. D. Luce (Ed.), Developments in mathematicalpsychology (pp. 5–119). Glencoe, IL: Free Press.

Luce, R. D., Green, D. M., & Weber, D. L. (1976).Attention bands in absolute identification. Percep-tion & Psychophysics, 20, 49–54.

Macmillan, N. A., & Creelman, C. D. (1991). Detec-tion theory: A user’s guide. Cambridge, England:Cambridge University Press.

Mathai, A. M. (1975). Basic concepts in informationtheory and statistics: Axiomatic foundations andapplications. New York: Wiley.

McGill, W. J. (1954). Multivariate information trans-mission. Psychometrika, 19, 97–116.

Miller, G. A. (1956). The magical number seven, plusor minus two: Some limits on our capacity forprocessing information. Psychological Review, 63,81–97.

187INFORMATION THEORY

Page 6: Whatever Happened to Information Theory in Psychology? · Whatever Happened to Information Theory in Psychology? ... communication theory to information theory, ... Introduction of

Norwich, K. H. (1993). Information, sensation, andperception. San Diego, CA: Academic Press.

Quastler, H. (1955). Information theory in psychol-ogy. Glencoe, IL: Free Press.

Shannon, C. E. (1948). A mathematical theory ofcommunication. Bell System Technical Jour-nal, 27, 379–423, 623–656.

Shannon, C. E., & Weaver, W. (1949). The mathe-matical theory of communication. Urbana: Univer-sity of Illinois Press.

Skilling, J. (1989). Maximum entropy and Bayesianmethods: Cambridge England 1988. Dordrecht,the Netherlands: Kluwer.

Swets, J. (Ed.). (1964). Signal detection and recog-nition by human observers. New York: Wiley.

Tribus, M. (1979). Thirty years of information the-ory. In R. D. Levine & M. Tribus (Eds.), Themaximum entropy formalism (pp. 1–14). Cam-bridge, MA: MIT Press.

Zellner, A. (1988). Optimal information processingand Bayes’ theorem. American Statistician, 42,278–284.

Received February 5, 2002Accepted March 15, 2002 �

188 LUCE