article seeing the scientific image - drjohnruss.comdrjohnruss.com/downloads/seeing.pdf · the...

48
John Russ is the author of The Image Processing Handbook, Computer Assisted Microscopy, Practical Stereology, Forensic Uses of Digital Imaging, Image Analysis of Food Structure, as well as many other books and papers. He has been involved in the use of a wide variety of microscopy techniques and the computerized analysis of microstructural images for near- ly 50 years. One of the original founders of Edax International (manufacturer of X-ray analyti- cal systems), and the past Research Director of Rank Taylor Hobson (manufacturer of precision metrology instru- ments), he has been since 1979 a professor in the Materials Science department at North Carolina State University. Now retired, he continues to write and lecture on topics related to image analysis. Article Vol. 39/2 Proceedings RMS June 2004 1 Seeing the Scientific Image John C. Russ Materials Science and Engineering Department, North Carolina State University, Raleigh, NC What we see and why Human beings are intensely visual creatures. Most of the information we acquire comes through our eyes (and the related circuitry in our brains), rather than through touch, smell, hearing or taste. For better or worse, that is also the way scientists acquire information from their experiments. But the skills in interpreting images developed by mil- lions of years of evolution don't deal as well with scientific images as they do with "real world" experiences. Understanding the differences in the types of information to be extracted, and the biases introduced by our vision systems, is a necessary requirement for the scientist who would trust his or her results. It is the purpose of this article to acquaint or remind readers of the needs and remedies. The percentage of information that flows through visual pathways has been estimated at 90-95% for a typical human without any sensory impairment. Indeed, our dependence on vision can be judged from the availability of corrective means - ranging from eyeglasses to laser eye surgery - for those whose vision isn't perfect or deteriorates with age. Hearing aids and cochlear implants are available (but under-uti- lized) for those with severe hearing loss, but there are no palliatives for the other senses. As taste becomes less sensitive, the only solution is to sprinkle on more chili powder. Not all animals, even all mammals, depend on or use sight to the extent that we do (Figure 1). Bats and dolphins use echolocation or sonar to probe the world about them. Pit vipers sense infrared radiation. Moles, living underground, trade sight for sensitive touch organs around their nose. Bloodhounds follow scents and butterflies have taste organs so sensitive they can detect single molecules. Some eels generate and sense electric fields that interact with their surroundings. Fish and alli- gators have pressure sensors that detect very slight motions in their watery environment. Birds and bees both have the ability to detect the polarization of light, as an aid to locating the sun position on a cloudy day. Birds and some bacteria seem to be able to sense the orientation of the earth's magnetic field, another aid to navigation. And many birds and insects have vision systems that detect infrared or ultraviolet colors beyond our range of vision. It is not easy for humans to imagine what the world looks like to a bat, eel or mole. Indeed, even the word "imagine" demonstrates the prob- lem. The root word "image" implies a picture, or scene, constructed inside the mind. Dependent as we are on images, that is the only orga- nization of world data that is comprehensible to most of us, and our language reflects (sic!) or illustrates (sic!) that bias. With two forward facing eyes capable of detecting light over a wave- length range of about 400-700 nm (blue to red), we are descended from arboreal primates who depended on vision and stereoscopy for navigation and hunting. Many animals and insects instead sacrifice stereoscopy for coverage, with eyes spaced wide to detect motion. A few, like the chameleon, can move their eyes independently to track Article

Upload: others

Post on 11-Oct-2019

1 views

Category:

Documents


0 download

TRANSCRIPT

John Russ is the author of TheImage Processing Handbook,Computer Assisted Microscopy,Practical Stereology, ForensicUses of Digital Imaging, ImageAnalysis of Food Structure, aswell as many other books andpapers. He has been involved inthe use of a wide variety ofmicroscopy techniques and thecomputerized analysis ofmicrostructural images for near-ly 50 years. One of the originalfounders of Edax International(manufacturer of X-ray analyti-cal systems), and the pastResearch Director of RankTaylor Hobson (manufacturer ofprecision metrology instru-ments), he has been since 1979 aprofessor in the MaterialsScience department at NorthCarolina State University. Nowretired, he continues to writeand lecture on topics related toimage analysis.

Article

Vol. 39/2 Proceedings RMS June 2004 1

Seeing the Scientific ImageJohn C. Russ

Materials Science and Engineering Department,North Carolina State University, Raleigh, NC

What we see and why

Human beings are intensely visual creatures. Most of the informationwe acquire comes through our eyes (and the related circuitry in ourbrains), rather than through touch, smell, hearing or taste. For betteror worse, that is also the way scientists acquire information from theirexperiments. But the skills in interpreting images developed by mil-lions of years of evolution don't deal as well with scientific images asthey do with "real world" experiences. Understanding the differencesin the types of information to be extracted, and the biases introducedby our vision systems, is a necessary requirement for the scientist whowould trust his or her results. It is the purpose of this article to acquaintor remind readers of the needs and remedies.

The percentage of information that flows through visual pathways hasbeen estimated at 90-95% for a typical human without any sensoryimpairment. Indeed, our dependence on vision can be judged fromthe availability of corrective means - ranging from eyeglasses to lasereye surgery - for those whose vision isn't perfect or deteriorates withage. Hearing aids and cochlear implants are available (but under-uti-lized) for those with severe hearing loss, but there are no palliatives forthe other senses. As taste becomes less sensitive, the only solution is tosprinkle on more chili powder.

Not all animals, even all mammals, depend on or use sight to the extentthat we do (Figure 1). Bats and dolphins use echolocation or sonar toprobe the world about them. Pit vipers sense infrared radiation. Moles,living underground, trade sight for sensitive touch organs around theirnose. Bloodhounds follow scents and butterflies have taste organs sosensitive they can detect single molecules. Some eels generate andsense electric fields that interact with their surroundings. Fish and alli-gators have pressure sensors that detect very slight motions in theirwatery environment. Birds and bees both have the ability to detect thepolarization of light, as an aid to locating the sun position on a cloudyday. Birds and some bacteria seem to be able to sense the orientation ofthe earth's magnetic field, another aid to navigation. And many birdsand insects have vision systems that detect infrared or ultraviolet colorsbeyond our range of vision.

It is not easy for humans to imagine what the world looks like to a bat,eel or mole. Indeed, even the word "imagine" demonstrates the prob-lem. The root word "image" implies a picture, or scene, constructedinside the mind. Dependent as we are on images, that is the only orga-nization of world data that is comprehensible to most of us, and ourlanguage reflects (sic!) or illustrates (sic!) that bias.

With two forward facing eyes capable of detecting light over a wave-length range of about 400-700 nm (blue to red), we are descendedfrom arboreal primates who depended on vision and stereoscopy fornavigation and hunting. Many animals and insects instead sacrificestereoscopy for coverage, with eyes spaced wide to detect motion. Afew, like the chameleon, can move their eyes independently to track

Article

2 Vol. 39/2 Proceedings RMS June 2004

different objects. But even in the category of hunterswith stereo vision there are many birds with muchbetter sight than humans. Eagles have resolution thatcan distinguish a mouse at a range of nearly a mile. Infact, most birds devote a much larger portion of theirhead space to eyes than we do. In some birds, the eyesare so large that it affects other functions, such as usingblinking to force the eyes down onto the throat toswallow food.

An oft-quoted proverb states that "a picture is worth athousand words," and is used as an illustration of theimportance of images and their apparent rich infor-mation content. But the proverb is wrong in manyways. First because a typical image, digitized andstored in a computer, occupies the space of severalmillion words of text (and even then, the resolution ofmodern digital cameras is far less than that of thehuman eye, which has about 160 million rods and

Figure 1. Eyes come in many forms, optimized for different purposes. Insect eyes consist of many individual lenses and sen-sors, producing comparatively low resolution. The chameleon can swivel its eyes independently to track different objects inleft and right visual fields. The horse has little stereo vision but a broad field of view. The eagle's acuity and resolution isextremely high. Primates are well adapted for stereo vision and also have greater sensitivity to red colors than most other ani-mals. The eye of the octopus apparently evolved independently and has its neural circuitry on the opposite side of the reti-na, but provides very good acuity and color sensitivity.

Vol. 39/2 Proceedings RMS June 2004 3

cones). Second, because as a means of communicatinginformation from one person to another the image isvery inefficient. There is little reason to expect anoth-er person to derive the same information from a pic-ture as we did without some supporting informationto bring it to their attention and create a context forinterpreting it. Arlo Guthrie describes this in "Alice'sRestaurant" as "Twenty-seven 8�10 color glossy pic-tures with circles and arrows and a paragraph on theback of each one." And that is not a bad description ofmany typical scientific papers!

Human vision can extract several different kinds ofinformation from images, and much of the processingthat takes place has been optimized by evolution andexperience to perform very efficiently. But at thesame time, other types of information are eitherignored or suppressed and are not normallyobserved. Sherlock Holmes often criticized Watsonfor "seeing but not observing" which is as good a dis-tinction as any between having photons fall upon theretina and the conscious mind becoming aware. Wewill examine some of the processes by which informa-tion is extracted and the conscious levels of the mindalerted, and note that the extraction process over-looks some kinds of information or makes them verydifficult to detect.

Recognition

The goal of much of human vision is recognition.Whether searching for food, avoiding predators, orwelcoming a mate, the first thing that catches ourattention in an image is something familiar. To be rec-ognized, an object or feature must have a name -some label that our consciousness can assign. Behindthat label is a mental model of the object, which maybe expressed either in words, images or other forms.This model captures the important (to us) character-istics of the object. It is unfortunate in many scientific

experiments that the task assigned to human vision isnot the recognition of familiar objects but the detec-tion and description of unfamiliar ones, which is farmore difficult.

The basic technique that lies at the root of humanvision is comparison. Nothing in images is measuredby the eye and mind; we have no rulers and protrac-tors in our heads. Features that can be viewed next toeach other with similar orientation, surroundings andlighting can be compared most easily. Ones that mustbe mentally flipped or rotated are more difficult.Figure 2 shows an example in which the length oftime required to mentally turn each object over in themind to match alignments and determine which fea-tures are the same, and which are mirror images, isproportional to the angular differences betweenthem. Comparisons to memory work the same way,and take time. If the remembered object is a veryfamiliar one, then the underlying model consists of aset of characteristics that can be compared. That is,after all, how recognition works.

If the remembered object is not familiar, and has nolabel and model, then comparison depends on justwhich characteristics of the original view wereremembered. How well the memory process worked,and which features and characteristics were selectedfor recall, are themselves subject to comparisons tostill other models. As an example, eyewitness accountsof crime and accident scenes are notoriously unreli-able. Different observers select different attributes ofthe scene or suspect as being notable based on theirsimilarity or difference from other objects in memory,so of course each person's results vary. Police sketchesof suspects rarely match well with actual photographstaken after capture. In some respects they are carica-tures, emphasizing some aspect (often trivial) thatseemed familiar or unusual to an individual observer.

A threshold logic unit implements the process thatcan signal recognition based on the weighted sum ofmany inputs. This process may not duplicate theexact functions of a real neuron, but is based on theMcCullough and Pitts "perceptron" model which suc-cessfully describes the overall process (Figure 3).

Recognition is frequently described in terms of a"grandmother cell." This is a theoretical construct, nota single physical cell someplace in the brain, but itprovides a useful framework to describe some of thesignificant features of the recognition process. Theidea of the grandmother cell is that it patiently exam-ines every image for the appearance of grandmother,and then signals the conscious mind that she is pre-sent. Processing of the raw image that reaches theretina proceeds in several places, including the retinaand visual cortex, and in a very parallel fashion. Inthe process, several characteristics that may roughly

Figure 2. Some of these objects are identical and some aremirror images. The length of time required to turn eachone over in the mind for comparison is proportional to theangular difference.

4 Vol. 39/2 Proceedings RMS June 2004

be described as color, size, position and shape areextracted. Some of these can be matched with those inthe stored model for grandmother (such as shortstature, white hair, a smile, perhaps even a familiardress). Clearly, some characteristics are more impor-tant than others, so there must be weighting of theinputs. If enough positive matches exist, and in theabsence of negative characteristics (such as a flamingred mustache), then the "grandmother" signal issent.

This simple model for a "threshold logic unit" evolvedinto the modern neural net, in which several layers ofthese individual decision making units are connected.Their inputs combine data from various sensors andthe output from other units, and the final output is adecision, basically recognition that the inputs matchsome recognized circumstance or object. The charac-teristics of neural net decisions, whether performedon images in the human mind or other types of datain a computer circuit, are very high speed (due to theextremely parallel way that all of the small logic deci-sions happen at the same time), the ability to learn (byadjusting the weights given to the various inputs), andthe tendency to make mistakes.

Everyone has had the experience of thinking theyrecognized someone ("grandmother") and then oncloser inspection realized that it isn't actually the rightperson at all. There were enough positive clues, andthe absence of negative ones, to trigger the recogni-tion process. Perhaps in a different situation, or witha different point of view, we wouldn't have made thatmistake. But setting the threshold value on theweighted sum of positive inputs too high, while itwould reduce false positives, would be inefficient,requiring too much time to collect more data. Thepenalty for making a wrong identification is a littleminor embarrassment. The benefit of the fast andefficient procedure is the ability to perform recogni-tions based on incomplete data.

In some implementations of this logic, it is possible toassign a probability or a degree of confidence to anidentification, but the utility of this value depends inhigh degree upon the quality of the underlyingmodel. This may be represented as the weights in aneural net, or the rules in an fuzzy logic system, or insome other form. In human recognition, the list offactors in the model is not so explicit. Writing downall of the characteristics that help to identify grand-mother (and the negative exclusions) is very difficult.In most scientific experiments, we try to enumeratethe important factors, but there is always a back-ground level of underlying assumptions that may ormay not be shared by those who read the results.

It is common in scientific papers that involve imagingto present a picture, usually with the caption "typicalappearance" or "representative view." Editors of tech-nical journals understand that these pictures areintended to show a few of the factors in the model listthat the author considers particularly significant. Butof course no one picture can be truly "typical." For onething, most naturally occurring structures have somevariability and the chances of finding the mean value ofall characteristics in one individual is small, and in anycase would not demonstrate the range of variation.

But even in the case of an individual like grandmoth-er, no one picture will suffice. Perhaps you have aphoto that shows her face, white hair, rimless glasses,and she is even wearing a favorite apron. So you pre-sent that as the typical picture, but the viewer noticesinstead that she is wearing an arm sling, because onthe day you took the picture she happened to havestrained her shoulder. To you, that is a trivial detail,not the important thing in the picture. But the view-er can't know that, so the wrong information is trans-mitted by the illustration.

Editors know that the real meaning of "typical pic-ture" is "this is the prettiest image we have." Pictureselection often includes an aesthetic judgment thatbiases many uses of images.

Figure 3. Comparison of a physical neuron, theMcCullough and Pitts simplified model of a neuron, and itsimplementation as a threshold logic unit. If the weightedsum of many inputs exceeds a threshold then the output(which may go to another logic unit) is turned on. Learningconsists of adjusting the weights, which may be either posi-tive or negative.

Vol. 39/2 Proceedings RMS June 2004 5

Technical specs

The human eye is a pretty good optical device (Figure4). Based on the size of the lens aperture (5�10-3 m)and the wavelength of light (about 5�10-7 m forgreen) the theoretical resolution should be about 10-4

radians or 1/3 arc minute. The lens focuses light ontothe retina, and it is only in the fovea, the tiny portionof the retina in which the cones are most denselypacked, that the highest resolution is retained in thesensed image. One arc minute is a reasonable esti-mate for the overall performance of the eye, a handynumber that can be used to estimate distances.Estimate the size of the smallest objects you canresolve, multiply by 3000 and that's how far away youare in the same units. For example a car (about 13 feetlong) can be seen from an airplane at 40,000 feet, andso on.

The number of 160 million rods and cones in the reti-na does not estimate the actual resolution of images.When we "look at" something, we rotate our headand/or our eyeballs in their sockets so that the imageof that point falls onto the fovea, where the cone den-sity is highest. The periphery of our vision has rela-tively fewer cones (which respond to color) as com-pared to rods (which sense only brightness), and isimportant primarily for sensing motion and for judg-ing scene illumination so we can correct for color bal-ance and shading. To produce a digital camera thatcaptured entire scenes (as large as the area we see)with resolution that would match human fovealvision, so we could later look anywhere in the storedimage and see the finest detail, would require severalbillion sensors.

Human vision achieves something quite miraculousby rapidly shifting the eye to look at many different

locations in a scene and, without any conscious effort,combining those bits and pieces into a single per-ceived image. There is a blind spot in the retina with-out sensors, where the optic nerve connects. We don'tnotice that blind spot because the brain fills it in withpieces interpolated from the surroundings or storedfrom previous glances. Tests in which objects appearor disappear from the blind spot prove that we don'tactually get any information from there - our mindsmake something up for us.

The eye can capture images over a very wide range ofillumination levels, covering about 9 or 10 orders ofmagnitude from nearly single photon performanceon a starlit night to a bright sunny day on the skislopes. Some of that adaptation comes from changingthe aperture with the iris, but most of it depends onprocessing in the retina. Adaptation to changing lev-els of illumination takes some time, up to several min-utes depending on the amount of change. In thedarkest few orders of magnitude we lose color sensi-tivity and use only the rods. Since the fovea is rich incones but has few rods, looking just "next to" what wewant to see ("averted vision") is a good strategy in thedark. It shifts the image over so to an area with more rodsto capture the dim image, albeit with less resolution.

Rods are not very sensitive to light at the red end ofthe visible spectrum, which is why red light illumina-tion is used by astronomers, submariners, and otherswho wish to be able to turn off the red light andimmediately have full sensitivity in the dark-adaptedrod vision. The cones come in three kinds, which eachrespond over slightly different wavelength ranges(Figure 5). They are typically called long, mediumand short wavelength receptors, or, more succinctly,red, green and blue-sensitive. By comparing theresponse of each type of cone, the eye characterizescolor. Yellow is a combination of red and green,magenta is the relative absence of green, and so on.

Because of the different densities of red, green andblue sensitive cones, the overall sensitivity of the eye is

Figure 4. Simplified diagram of the eye, showing the lens,retina, fovea, optic nerve, etc.

Figure 5. Sensitivity of the rods (shown in grey) and threekinds of cones (shown in red, green and blue) as a functionof wavelength. Human vision detects roughly the rangefrom about 400 nm (blue) to 700 nm (red).

6 Vol. 39/2 Proceedings RMS June 2004

greatest for green light and poorest for blue light(Figure 6). But this sensitivity comes at a price: it is inthis same range of wavelengths that our ability to dis-tinguish one color from another is poorest. A com-mon technique in microscopy uses filters to select justthe green light because of the eye's sensitivity, but ifdetection of color changes is important this is not agood strategy.

Like most of the things that the eye does, the percep-tion of color is determined in a comparative ratherthan an absolute way. It is only by comparing some-thing to a known color reference that we can reallyestimate color at all. The usual color reference is awhite object, since that (by definition) has all colors. Ifthe scene we are looking at contains somethingknown to be a neutral grey in color, then any varia-tion in the color of the illumination can be compen-sated for. This is not so simple as it might seem,because many objects do not reflect light of all colorsequally and appear to change color with illumination(this "metamerism" is often a problem with ink jetprinters).

If the illumination is deficient in some portion of thecolor spectrum compared to daytime sunlight (whichour vision evolved under), then the missing colorscan't be reflected or detected and it is impossible toaccurately judge the objects true colors. Under mono-chromatic yellow sodium lights, color is completelyconfused with albedo (total reflectivity), and we can'tdistinguish color from brightness. Even with typicalindoor lighting, colors appear different (which is whythe salesperson suggests you might want to carry thatshirt and tie to the window to see how they look insunlight!).

When Newton first split white sunlight into its com-ponent parts using a prism, he was able to show thatthere were invisible components beyond both ends ofthe visible spectrum by measuring the rise in temper-ature that was caused by the absorption of the light.Since sunlight extends well beyond the narrow 400-700 nm range of human vision, it is not surprisingthat some animals and insects have evolved vision sys-

tems that can detect it. Plants, in particular, havedeveloped signals that are visible only in theseextended colors in order to attract birds and insects topollinate them. Extending our human vision intothese ranges and even beyond is possible with instru-mentation. UV microscopy, radio astronomy, X-raydiffraction all use portions of the electromagneticspectrum beyond the visible, and all produce datathat is typically presented as images, with colors shift-ed into the narrow portion of the spectrum we candetect.

Being able to detect brightness or color is not thesame thing as being able to measure it or detect smallvariations in either brightness or color. While humanvision functions over some 9-10 orders of magnitude,we cannot view a single image that covers such a widerange, nor detect variations of one part in 109. Achange in brightness of about 2-3% over a lateral dis-tance of a few arc minutes is the limit of detectabilityunder typical viewing conditions. It is important tonote that the required variation is a percentage, sothat a greater absolute change in brightness isrequired in bright areas than in dark ones. Anyonewith darkroom experience is aware that differentdetails are typically seen in a negative than in a posi-tive image, as shown in Figure 7.

Overall, the eye can detect only about 20-30 shades ofgrey in an image, and in many cases fewer will pro-duce a visually satisfactory result. In an image domi-nated by large areas of different brightness, it is diffi-cult to pick out the fine detail with small local contrastwithin each area. One of the common methods forimproving the visibility of local detail is computerenhancement that reduces the global (long-range)variation in brightness while increasing the local con-trast (Figure 8). This is typically done by comparing apixel to its local neighborhood. If the pixel is slightlybrighter than its neighbors, it is made brighter still,and vice versa.

Local and abrupt changes in brightness (or color) arethe most readily noticed details in images. Variationsin texture often represent structural variations thatare important, but these are more subtle. As shown in

Figure 6. Overall visual sensitivity is lowest for blue light,highest for green.

Figure 7. Positive and negative representations of an X-ray.Somewhat different details are visually evident in theseimages because human vision is not linear, but detects pro-portional changes in brightness.

7 Vol. 39/2 Proceedings RMS June 2004

Figure 9, in many cases variations that are classified astextural actually represent changes in the averagebrightness level. When only a change in texture ispresent, visual detection is difficult. If the boundariesare not straight lines (and especially vertical or hori-zontal in orientation), they are much more difficult tosee.

Thirty shades of brightness in each of the red, greenand blue cones would suggest that 303 = 27,000 col-ors might be distinguished but that is not so.Sensitivity to color changes at the ends of the spec-trum is much better than in the middle (in otherwords, greens are hard to distinguish from eachother). Only about a thousand different colors can bedistinguished. Since computer displays offer 256shades of brightness for the R, G and B phosphors, or2563 = 16 million colors, we might expect that theycould produce any color we can see. However, this isnot the case. Both computer displays and printedimages suffer severe limitations in gamut - the totalrange of colors that can be produced - as compared towhat we can see. This is another reason that the "typ-ical image" may not actually be representative of theobject or class of objects.

Figure 8. Local contrast enhancement allows visual inspec-tion of low-contrast detail in both the bright and darkregions of this image of a fingerprint on a high contrastmagazine cover.

Figure 9. Examples of textur-al differences: a) regions arealso different in averagebrightness; b) no brightnessdifference but simple linearboundaries; c) irregularboundaries.

Vol. 39/2 Proceedings RMS June 2004 8

Acuity

Many animals, particularly birds, have vision thatproduces much higher spatial resolution thanhumans. Human vision achieves its highest spatialresolution in just a small area at the center of the fieldof view (the fovea), where the density of light-sensingcones is highest. At a 50 centimeter viewing distance,details with a width of 1 mm represent an angle of0.11 degrees. Acuity (spatial resolution) is normallyspecified in units of cycles per degree. The upperlimit (finest detail) visible with the human eye is about50 cycles per degree, which would correspond to agrating in which the brightness varied from minimumto maximum about 5 time over that same 1 mm. Atthat fine spacing, 100% contrast would be needed, inother words black lines and white spaces. This iswhere the common specification arises that the finestlines distinguishable without optical aid are about 100µm.

Less contrast is needed between the light and darklocations to detect them when the features are larger.Brightness variations about 1 mm wide represent aspatial frequency of about 9 cycles per degree, and

under ideal viewing conditions can be resolved with acontrast of less than 1%, although this assumes theabsence of any noise in the image and a very brightimage (acuity drops significantly in dark images orones with superimposed random variations, and ismuch poorer at detecting color differences thanbrightness variations).

At a normal viewing distance of about 50 cm, 1 mm onthe image is about the optimum size for detecting thepresence of detail. On a typical computer monitorthat corresponds to about 4 pixels. As the spatial fre-quency drops (features become larger) the requiredcontrast increases, so that when the distance overwhich the brightness varies from minimum to maxi-mum is about 1 cm, the required contrast is about 10times greater. The variation of spatial resolution("acuity") with contrast is called the modulation trans-fer function (Figure 10).

Enlarging images does not improve the ability to dis-tinguish small detail, and in fact degrades it. Thecommon mistake made by microscopists is to work atvery high magnification expecting to see the finestdetails. That may be needed for details that are smallin dimension, but it will make it more difficult to seelarger features that have less contrast.

Because the eye does not "measure" brightness, butsimply makes comparisons, it is very difficult to dis-tinguish brightness differences unless the regions areimmediately adjacent. Figure 11a shows four greysquares, two of which are 5% darker than the others.Because they are separated, the ability to comparethem is limited. Even if the regions are adjacent, as inFigure 11b, if the change from one region to anotheris gradual it cannot be detected. Only when the stepis abrupt, as in Figure 11c, can the eye easily deter-mine which regions are different.

Even when the features have sizes and contrast thatshould be visible, the presence of variations in thebackground intensity (or color) can prevent the visu-al system from detecting them. In the example of

Figure 10. Illustration of the modulation transfer functionfor human vision, showing the greatest ability to resolvelow-contrast details occurs at an intermediate spatial fre-quency, and becomes poorer for both smaller and largerdetails.

Figure 11. Comparison of regions with a 5% brightness difference: a) separated; b) adjacent but with a gradual change; c)adjacent with an abrupt boundary.

Vol. 39/2 Proceedings RMS June 2004 9

Figure 12, the text has a local contast of about 1% butis superimposed on a ramp that varies from white toblack. Application of an image processing operationreveals the message. The "Unsharp Mask" routinesubtracts a blurred (smoothed) copy of the imagefrom the original, suppressing large scale variations inorder to show local details.

It is easy to confuse resolution with visibility. A star inthe sky is essentially a point; there is no angular size,and even in a telescope it does not appear as a disk. Itis visible because of its contrast, appearing brightagainst the dark sky. Faint stars are not visible to thenaked eye because there isn't enough contrast.Telescopes make them visible by collecting more lightinto a larger aperture. A better way to think about res-olution is the ability to distinguish as separate twostars that are close together. The classic test for thishas long been the star Mizar in the handle of the bigdipper. In Van Gogh's "Starry Night over the Rhone"each of the stars in this familiar constellation is shown

as a single entity (Figure 13). But a proper star chartshows that the second star in the handle is actuallydouble. Alcor and Mizar are an optical double - twostars that appear close together but in fact are at dif-ferent distances and have no gravitational relation-ship to each other. They are separated by about 11.8minutes of arc, and being able to detect the two asseparate has been considered by many cultures fromthe American Indians to the desert dwellers on thenear east as a test of good eyesight (as well as the needfor a dark sky with little turbulence or water vapor,and without a moon or other light pollution).

But there is more to Mizar than meets the eye, liter-ally. With the advent of the Galilean telescope,observers were surprised to find that Mizar itself is adouble star. Giovanni Battista Riccioli (1598 - 1671),the Jesuit astronomer and geographer of Bologna, isgenerally supposed to have split Mizar, the first dou-ble star ever discovered, around 1650. The two starsMizar-A and Mizar-B, are a gravitational double 14.42

Figure 12. Intensity differences superimposed on a varying background are visually undetectable: a) original; b) processedwith an "Unsharp Mask" filter to suppress the gradual changes and reveal the detail.

Figure 13. The big dipper, in Van Gogh's "Starry NightOver the Rhone", and as shown in a star chart.

Vol. 39/2 Proceedings RMS June 2004 10

arc seconds apart, and any good modern telescope canseparate them (Figure 14). But they turn out to be evenmore complicated - each star is itself a double as well, soclose together that only spectroscopy can detect them.

There are many double stars in the sky that can beresolved with a backyard telescope, and many of themare familiar viewing targets for amateur astronomerswho enjoy the color differences between some of thestar pairs, or the challenge of resolving them. Thisdepends on more than just the angular separation. Ifone star is significantly brighter than the other, it ismuch more difficult to see the weaker star close to thebrighter one.

What the eye tells the brain

Human vision is a lot more than rods and cones in theretina. An enormous amount of processing takesplace, some of it immediately in the retina and somein the visual cortex at the rear of the brain, before an"image" is available to the conscious mind. The neur-al connections within the retina were first seen abouta hundred years ago by Ramón y Cajal, and havebeen studied ever since. Figure 15 shows a simplifieddiagram of the human retina. The light-sensing rodsand cones are actually at the back, and light must passthrough several layers of processing cells to reachthem. In many nocturnal animals the pigmentedlayer behind the rods and cones reflects light back sothat the photons are twice as likely to be captured anddetected (and some comes back out through the lensto produce the reflective eyes we see watching us atnight). Incidentally, the eye of the octopus does nothave this backwards arrangement; evolution in thatcase put the light sensing cells on top where they canmost efficiently catch the light.

The first layer of processing cells, called horizontalcells, connect light sensing cells in various size neigh-borhoods. The next layer, the amacrine cells, combine

and compare the outputs from the horizontal cells.Finally the ganglion cells collect the outputs for trans-mission to the visual cortex. This physical organiza-tion corresponds directly to the logical processes ofinhibition, discussed below.

In many respects, the retina of the eye is actually partof the brain. Understanding the early processing ofimage data and the extraction of the informationtransmitted from the eye to the visual cortex has beenawarded the Nobel prize (in 1981, to David H. Hubeland Torsten N. Wiesel, for their discoveries concern-ing information processing in the visual system).

Without elaborating the anatomical details of the reti-na or the visual cortex discovered by Hubel, Wiesel,and others (particularly a seminal paper "What thefrog's eye tells the frog's brain" published in 1959 byJerome Lettvin), it is still possible to sum up the logi-cal and practical implications. Within the retina, out-puts from the individual light sensors are combinedand compared by layers of neurons. Comparing theoutput from one sensor or region to that from thesurrounding sensors, so that excitation of the centeris tested against the inhibition from the surroundings,is a basic step that enables the retina to ignore regionsthat are uniform or only gradually varying in bright-ness, and to efficiently detect locations where achange in brightness occurs. Testing over differentsize regions locates points and features of varyingsizes. Comparison of output over time is carried outin the same way to detect changes.

In the frog's eye, the retina processes images to findjust a few highly specific stimuli. These include smalldark moving objects (food) and the location of thelargest, darkest region (safety in the pond). Like thefly and a human, the eye is hard-wired to detect"looming," something that grows rapidly and does notshift in the visual field. That represents somethingcoming toward the eye, and causes the fly to avoid the

Figure 14. Photograph showing the relative separation ofAlcor from Mizar A and B.

Figure 15. The principal layers in the retina. Light passesthrough several layers of processing neurons to reach thelight-sensitive rod and cone cells. the horizontal, bipolarand amacrine cells combine the signals from various sizeregions, compare them to locate interesting features, andpass that information on to higher levels in the visual cortex.

Vol. 39/2 Proceedings RMS June 2004 11

swatter or the frog to jump into the water. In a human,the eyelid blinks for protection without our ever becom-ing consciously aware that an object was even seen.

The outputs from these primitive detection circuitsare then further combined in the cortex to locate linesand edges. There are specific regions in the cortexthat are sensitive to different orientations of lines andto their motion. The "wiring" of these regions is notbuilt in, but must be developed after birth; cats raisedin an environment devoid of lines in a specific orien-tation do not subsequently have the ability to see suchlines. Detection of the location of brightness changes(feature edges) creates a kind of mental sketch of thescene, which is dominated by the presence of lines,edges, corners and other simple structures. These inturn are linked together in a hierarchy to produce theunderstanding of the scene in our minds.

The extraction of changes in brightness or color withposition or with time explains a great deal about whatwe see in scenes, and about what we miss. Changesthat occur gradually with position, such as shading oflight on a wall, is ignored. We have to exert a reallyconscious effort to notice such shading, and even thenwe have no quantitative tools with which to estimateits magnitude. But even small changes in brightnessof a few percent are visible when they occur abruptly,producing a definite edge. And when that edge formsa straight line (and particularly when it is vertical) it isnoticed. Similarly, any part of a scene that is staticover time tends to be ignored, but when somethingmoves it attracts our attention.

These techniques for extracting information fromscenes are efficient because they are highly parallel.For every one of the 160 million light sensors in theeye, there are as many as 50,000 neurons involved inprocessing and comparing. One of Hubel andWiesel's contributions was showing how the connec-tions in the network are formed shortly after birth,and the dependence of that formation on providingimagery to the eyes during that critical period.

Mapping of the specific circuitry in the brain isaccomplished by placing electrodes in various loca-

tions in the cortex, and observing the output of neu-rons as various images and stimuli are presented tothe eyes. At a higher level of scale and processing,functional MRI and PET scans can identify regions ofthe brain that respond to various activities and stim-uli. But there is another source of important knowl-edge about processing of images in the mind: identi-fying the mistakes that are made in image interpreta-tion. One important, but limited resource is studyingthe responses of persons who have known specificdamage, either congenital or as the result of illness oraccident. A second approach studies the errors ininterpretation of images resulting from visual illu-sions. Since everyone tends to make the same mis-takes, those errors must be a direct indication of howthe processing is accomplished. Several of the morerevealing cases will be presented in the sections thatfollow.

Spatial comparisons

The basic idea behind center-surround or excitation-inhibition logic is comparing the signals from a cen-tral region (which may be a single detector, or pro-gressively larger scales by averaging detectors togeth-er) to the output from a surrounding annular region.That is the basic analysis unit in the retina, and bycombining the outputs from many such primitiveunits, the detection of light or dark lines, corners,edges and other structures can be achieved. In thefrog's eye, it was determined that a dark object of acertain size (corresponding to an insect close enoughto be captured) generated a powerful recognition sig-nal. In the cat's visual cortex there are regions thatrespond only to dark lines of a certain length andangle, moving in a particular direction. Furthermore,those regions are intercalated with regions that per-form the same analysis on the image data from theother eye, which is presumed to be important in stere-opsis or fusion of stereo images.

This fundamental center-surround behavior explainsseveral very common illusions (Figure 16). A set ofuniform grey steps (Mach bands) are not perceived asbeing uniform. The side of each step next to a lighterregion is perceived as being darker, and vice versa,

Figure 16. Two common illusions based on inhibition: Mach bands (top) demonstrate that the visual system increases the per-ceived change in brightness at steps; the Craik-Cornsweet-O'Brien step (bottom) shows that providing the visual system witha step influences the judgment of values farther away.

Vol. 39/2 Proceedings RMS June 2004 12

because it is the step that is noticed. In the case of theMach bands, the visual response to the step aids us indetermining which region is darker, although itmakes it very difficult to judge the amount of the dif-ference. But the same effect can be used in theabsence of any difference to cause one to be per-ceived. In the Craik-Cornsweet-O'Brien illusion, twoadjacent regions have the same brightness but the val-ues are raised and lowered on either side of theboundary. The eye interprets this in the same way asfor the Mach bands, and judges that one region is, infact, lighter and one darker.

Similar excitation-inhibition comparisons are madefor color values. Boundaries between blocks of colorare detected and emphasized, while the absolute dif-ferences between the blocks are minimized. Blocks ofidentical color placed on a gradient of brightness orsome other color will appear to be different. Thehuman response to color is not the same as a camera's(either digital or film). Although the cones in the reti-na respond to short (blue), medium (green) and long(red) wavelengths, combinations and comparisonsperformed early in the visual process convert this to adifferent "space" for color interpretation.

Whether described in terms of brightness, hue andsaturation or the artist's tint, shade and tone, or vari-ous mathematical spaces such as Ycc (used in colorvideo) or Lab (used in many image processing soft-ware routines), three parameters are needed (Figure17). Brightness is a measure of how light or dark thelight is, without regard to any color information. Forthe artist, this is the shade, achieved by adding blackto the pigment. Hue distinguishes the various colors,progressing from red to orange, yellow, green, blue,and magenta around the color wheel we probablyencountered in Kindergarten (or the ROY G BIVmnemonic for the order of colors in the rainbow).Hue corresponds to the artist's pure pigment.Saturation is the amount of color present, such as thedifference between pink and red, or sky blue and

royal blue. A fully saturated color corresponds to theartist's pure pigment, which can be tinted with whitepigment to reduce the saturation (tone describesadding both white and black pigments to the purecolor).

Even without formal training in color science, this ishow people describe color to themselves.Combination and comparison of the signals from thered, green and blue sensitive cones is used to deter-mine these parameters. Simplistically, we can think ofthe sum of all three being the brightness, ratios of oneto another being interpreted as hue, and the ratio ofthe greatest to the least giving the saturation. Oneimportant thing to remember about hue is that it doesnot correspond to the linear wavelength range fromred to blue. We interpret a color with reduced greenbut strong red and blue as being magenta, which isnot a color in the wavelength sense but certainly is interms of perception.

Relying on comparisons to detect changes in bright-ness or color rather than absolute values simplifiesmany tasks of image interpretation. For example, thevarious walls of a building, all painted the same color,will have very different brightness values because ofshading, their angle to the sun, etc., but what weobserve is a building of uniform color with welldefined corners and detail because it is only the localchanges that count. The "white" walls on the shad-owed side of the building in Figure 18 are actuallydarker than the "black" shingles on the sunlit side, butour perception understands that the walls are whiteand the shingles black on all sides of the building.

The same principle of local comparisons applies tomany other situations. Many artists have used some-thing like the Craik-Cornsweet-O'Brien illusion tocreate the impression of a great range of light andshadow within the rather limited range of absolutebrightness values that can be achieved with paint oncanvas (Figure 19). The effect can be heightened by

Figure 17. Diagrams of color spaces: a) the RGB color space is a cube that is mathematically simple and describes the way cam-eras and displays work, but not the way people think about color; b) HSI space has a central intensity (also called brightness,value or luminance) axis that is pure grey scale, while the distance out from the axis (saturation) measures the amount of colorand the angle (hue) identifies what the color is (the Hue-Saturation plane is the color wheel familiar to school children); c)Lab space is a sphere with a vertical axis for Luminance (brightness) and to perpendicular color axes, one for red-green andthe other for blue-yellow (this is mathematically simpler than HSI and still separates intensity from color information).

adding colors, and there are many cases in whichshadows on skin are tinted with colors such as greenor magenta to increase the perception of a brightnessdifference.

Edwin Land (of Polaroid fame) studied color visionextensively and proposed a somewhat different inter-pretation than the tri-stimulus model described above(which is usually credited to Helmholtz). Also relyingheavily on the excitation-inhibition model for process-ing signals, Land's opponent-color "retinex" theorypredicts or explains several interesting visual phenom-ena. It is important to note that in Land's retinex the-ory the composition of light from one region in animage considered by itself does not specify the per-ceived color of that area, but rather that the color of anarea is determined by a trio of numbers, each comput-ed on a single waveband (roughly the long, mediumand short wavelengths usually described as red, greenand blue) to give the relationship on that wavebandbetween that area and the rest of the areas in the scene.

Vol. 39/2 Proceedings RMS June 2004 13

One of the consequences of Land's theory is that thespectral composition of the illumination becomes verymuch less important, and the color of a particularregion can become relatively independent of the lightthat illuminates it. Our eyes have relatively few conesto sense the colors in the periphery of our vision, butthat information is apparently very important injudging colors in the center by correcting for thecolor of incident illumination. Land demonstratedthat if a scene is photographed through red, greenand blue filters, and then projectors are used to shinecolored and/or white light through just two of thenegatives, the scene may be perceived as having fullcolor. Another interesting phenomenon is that a spin-ning disk with black and white bars may be perceivedas having color, depending on the spacing of the bars.Many of the questions about exactly how color infor-mation is processed in the human visual system havenot yet been answered.

The discussion of center-surround comparisonsabove primarily focused on the processes in the reti-na, which compare each point to its surroundings.But as the extracted information moves up the pro-cessing chain, through the visual cortex, there aremany other evidences of local comparisons. It wasmentioned that in the visual cortex there are regionsthat respond to lines of a particular angle. They arelocated adjacent to regions that respond to lines of aslightly different angle. Comparison of the outputfrom one region to its neighbor can thus be used todetect small changes in angle, and indeed that com-parison is carried out.

We don't measure angles with mental protractors, butwe do compare them and notice differences. Like thecase for brightness variations across a step, a differ-ence in angle is amplified to detect boundaries andincrease the perceived change. In the absence of anymarkings on the dial of a wristwatch, telling time tothe nearest minute is about the best we can do. Oneminute corresponds to a six degree motion of theminute hand. But if there are two sets of lines that

Figure 18. Comparison of the dark and light areas on theshadowed and sunlit sides of the building is performedlocally as discussed in the text.

Figure 19. Edward Hopper painted many renderings oflight and shadow. His "Sunlight in an Empty Room" illus-trates shading along uniform surfaces and steps at edges.

Figure 20. Zollner lines. The cross-hatching of the diagonallines with short vertical and horizontal ones causes our visu-al perception of them to rotate in the opposite direction. Infact, the diagonal lines are exactly parallel.

Vol. 39/2 Proceedings RMS June 2004 14

vary by only a few degrees, we notice that difference(and usually judge it to be much larger).

Inhibition as regards to angle means that cross-hatch-ing diagonal lines with short marks that are vertical orhorizontal will alter our perception of the main lineorientation. As shown in Figure 20, this makes it diffi-cult or impossible to correctly compare the angle ofthe principal lines (which are in fact parallel). Humanvision does not treat all angles the same, because wedid not evolve in a world where all directions are thesame. Very small deviations from vertical, and some-what larger ones from horizontal, are readily detect-ed, while very large changes are required to be detect-ed at arbitrary diagonal orientations.

Local to global hierarchies

Interpretation of elements in a scene relies heavily ongrouping them together to form features and objects.At very simple example (Figure 21) using just dotsshows that the eye connects nearest neighbors to con-struct alignments.

Similar grouping of points and lines is used to con-nect together parts of feature boundaries that are oth-erwise poorly defined, to create the outlines of objectsthat we visually interpret (Figure 22).

It is a simple extension of this grouping to includepoints nearest over time that produces the familiar"Star Trek" impression of motion through a starfield(Figure 23). Temporal comparisons are discussedmore fully below.

Our natural world does not consist of lines andpoints, but of the objects that they connect together torepresent. Our vision systems perform this groupingnaturally, and it is usually difficult to deconstruct ascene or structure into its component parts. Again,illusions are very useful to illustrate the way thisgrouping works (Figure 24). The lengths of two sim-ple parallel lines in isolation are easily compared. Butif additional lines are connected to these, they becomevisually part of the same structure and the perceivedlengths of the original lines are altered. In the arrowillusion, the different orientation of the added linesdoes not separate them from the main shaft. But ifthey are very different in color, the illusion weakensor fails because they are not grouped together.

Grouping is necessary for inhibition to work effective-ly. The cross-hatching in Figure 20 shifts the angle ofthe line because it is seen as being part of it. The com-

Figure 21. Connecting each point to its nearest neighborproduces radial or circumferential lines.

Figure 23. Star Trek introduced the "moving starfield"graphic (this Quicktime movie can be downloaded fromhttp://DrJohnRuss.com/images/Seeing/Fig_23.mov).

Figure 22: The cell boundaries in this tissue section arevisually constructed by grouping the lines and points.

Figure 24. Example of grouping. The slanted lines added tothe ends of the (identical) horizontal lines affects our visualcomparison of their length. This effect is reduced whenthey are different in color or thickness.

Vol. 39/2 Proceedings RMS June 2004 15

mon illusion of brightness alteration due to a sur-rounding, contrasting frame depends on the framebeing grouped with the central region. In the exam-ple shown in Figure 25, the insertion of a black lineseparating the frame from the center alters theappearance of the image and reduces or eliminatesthe perceived difference in brightness of the grey cen-tral regions. Without the line, the brighter and dark-er frames are grouped with the center and inhibitionalters the apparent brightness, making the regionwith the dark frame appear lighter and vice versa.

The combination of grouping with inhibition givesrise to the illusion shown in Figure 26. The angledshading causes the lines to appear slightly turned inthe opposite direction, and the various pieces of lineare connected by grouping, so that the entire figure isperceived as a spiral. But in fact, the lines are circles,as can be verified by tracing around one of them. Thisis an example of spatial grouping, but temporal group-ing works in much the same way. A rotating spiral iscommonly seen as producing an endless motion,whether it is a barber pole or the spinning disk used inevery cheap science fiction movie ever made (Figure 27).

Grouping together features in an image that are thesame in color lies at the very heart of the common

tests for color blindness. In the Ishihara tests, a set ofcolored circles are arranged so that similar colors canbe grouped to form recognizable numbers (Figure28). For the person who cannot differentiate the col-ors, the same grouping does not occur and the inter-pretation of the numbers changes. The exampleshown is one of a set that diagnoses red-green defi-ciency, the most common form of color blindness thataffects perhaps 1 in 10 men. This deficiency can beclassed as either protanopia or deuteranopia. Inprotanopia, the visible range of the spectrum is short-er at the red end compared with that of the normal,and that part of the spectrum that appears blue-greenin the normal appears to those with protanopia asgrey. In deuteranopia the part of the spectrum thatappears to the normal as green appears as grey.Purple-red (the complimentary color of green) alsoappears as grey. In the example, those with normalcolor vision should read the number 74. Red-green

Figure 25. In the top image the lighter and darker sur-rounds affects our visual comparison of the (identical) cen-tral grey regions. This effect is reduced in the bottom imageby the separating border.

Figure 27. Moving helical and spiral patterns such as thebarber pole and spinning disk (this Quicktime movie can bedownloaded from http://DrJohnRuss.com/images/Seeing/Fig_27.mov) produce an illusion of motion of the back-ground perpendicular to the lines.

Figure 26. Fraser's spiral. The circular rings are visually"tilted" and perceived to form a spiral.

Figure 28. One of the Ishihara color blindness test images(see text)

Vol. 39/2 Proceedings RMS June 2004 16

color deficiency will cause the number to be read as21. Someone with total color blindness will not be ableto read any numeral.

By interfering with our visual ability to group parts ofthe image together, camouflage is used to hideobjects. The military discovered this before the firstworld war, when the traditional white of US Navyships (Teddy Roosevelt's "Great White Fleet") wasreplaced by irregular patterns of greys and blues toreduce their visibility. But nature figured it out longago, and examples are plentiful. Predators wanting tohide from their prey, and vice versa, typically usecamouflage as a first line of defense. There are othervisual possibilities of course - butterflies whose spotslook like the huge eyes of a larger animal (mimicry),or frogs whose bright colors warn of their poison - butusually (as in Figure 29) the goal is simply to disap-pear. Breaking the image up so that the brain doesnot group the parts together very effectively preventsrecognition.

Motion can destroy the illusion produced by camou-flage, because moving objects (or portions of objects)attract notice, and if several image segments areobserved to move in coordinated ways they aregrouped, and the hidden object emerges. Humanvision attempts to deal with moving features or pointsas rigid bodies, and easily connects separated piecesthat move in a coordinated fashion. Viewing scenes orimages with altered illumination, or a colored filter,often reveals the objects as well. But in nature, theability to keep still and rely on camouflage is a well-developed strategy.

Grouping operates on many spatial (and temporal)scales. In a typical scene there may be lines, edges,and other features that group together to be per-ceived as an object, but then that object will begrouped with others to form a higher level of organi-zation, and so on. Violations that occur in the group-ing hierarchy give rise to conflicts that the mind mustresolve. Sometimes this is done by seeing only oneinterpretation and ignoring another, and sometimesthe mind switches back and forth between interpreta-tions. Figure 30 shows two examples. If the brightobject is perceived as the foreground, it is seen as avase with an irregular shape. If the backgroundaround the vase becomes the foreground, it emergesas two facing human profiles. An even simpler exam-ple is the drawing of a cube. The highlighted cornercan be seen either as the point closest to the viewer orfarthest away.

Figure 31 shows two more familiar examples of rival-rous interpretations. Is the first drawing the head of abunny rabbit or a duck? Is the second one a younggirl or an old woman? For all of these images in whichmultiple interpretations or foreground-backgroundreversal produce different perceptions, some people

Figure 29. Natural camouflage (a pygmy rattlesnake hidesvery well on a bed of leaves).

Figure 30. Illusions with two alternative interpretations: a) either two facing profiles or a vase; the Necker cube - is the cor-ner marked in red closest to or farthest from the viewer?

Vol. 39/2 Proceedings RMS June 2004 17

initially see only one or the other possibility, and mayhave difficulty in recognizing the alternative. Butonce you manage to "see" both interpretations, it isimpossible to see both at once, and for most peoplethe two possibilities alternate every few seconds as theimage is viewed.

As a trick that may provide some insight into howimages are processed in the mind, these exampleshave some interest. But they also provide a warningfor anyone who relies on visual inspection of imagesto obtain information about the subject. In order tosee past camouflage to connect the disparate parts ofan object and recognize its presence, you must have agood stored model of what the object is. But whenyou approach an image with a strong model and tryto connect the parts of the image to it, you are forcedto ignore other interpretations. Anything in theimage that does not conform to that model is likely tobe ignored.

In one of Tony Hillerman's detective stories, hisNavajo detective Joe Leaphorn explains how he looksfor tracks. The FBI man asks "What were you lookingfor?" and Leaphorn replies "Nothing in particular.You're not really looking for anything in particular. Ifyou do that, you don't see things you're not lookingfor." But learning how to look for everything andnothing is a hard skill to master. Most of us see onlythe things we expect to see.

The artist M. C. Escher created many drawings inwhich grouping hierarchy was consistent locally butnot globally, to create conflicts and rivalries that makethe art very interesting. Figure 32 shows diagramsand examples of some of his simpler conundrums. Inone, lines that represent one kind of edge at the tip ofthe fork become something different at its base,exchanging inside for outside along the way. In thesecond, the front-to-rear order of edges changes.Edges that occlude others and are therefore per-

Figure 31. Illusions with alternative interpretations: a) a duck or a rabbit; b) a young girl or an old woman.

Figure 32. Drawings (after Escher) of a "three-pronged" fork and an impossible cube.

Vol. 39/2 Proceedings RMS June 2004 18

ceived as being in front are grouped by angles thatclearly place them at the back. In both of these cases,the local interpretation of the information is consis-tent, but no global resolution of the inconsistencies ispossible.

Figure 33 shows another Escher drawing, called"climbing and descending." Locally the stairs have adirection that is everywhere clear and obvious.However, by clever use of perspective distortion, thesteps form a closed path without top or bottom, so thepath is endless and always in one direction. Severalother geometric rivalries are also present in this draw-ing.

Figure 33. Escher's "climbing and descending" and a drawing of the endless stair.

John Russ is the author of TheImage Processing Handbook,Computer Assisted Microscopy,Practical Stereology, ForensicUses of Digital Imaging, ImageAnalysis of Food Structure, aswell as many other books andpapers. He has been involved inthe use of a wide variety ofmicroscopy techniques and thecomputerized analysis ofmicrostructural images for near-ly 50 years. One of the originalfounders of Edax International(manufacturer of X-ray analyti-cal systems), and the pastResearch Director of RankTaylor Hobson (manufacturer ofprecision metrology instru-ments), he has been since 1979 aprofessor in the MaterialsScience department at NorthCarolina State University. Nowretired, he continues to writeand lecture on topics related toimage analysis.

“The first section of this three-part paper has emphasized thedependence of vision on localcomparisons of brightness, color,orientation, and feature relation-ships. In the next part, compar-isons over time are included tointerpret motion, and compar-isons over longer ranges areshown to influence judgments of distance. In addition, the ten-dency of people to see only a fewthings in a scene, and to see whatthey expect to see in a given con-text, is illustrated.

The third and concluding partdeals with object recognition”

Article

Vol. 39/3 Proceedings RMS September 2004 1

Seeing the Scientific Image, Part 2John C. Russ

Materials Science and Engineering Department,North Carolina State University, Raleigh, NC

It’s about time

Comparison and inhibition operate temporally as well as spatially. Theperiphery of our vision is particularly well wired to detect motion.Comparison of the response at one point to that a short time previous-ly is accomplished by a short time delay. Slightly more complicated con-nections detect motion of edges, with the ability to distinguish edges atdifferent orientations. Gradual changes of brightness and motion, likegradual spatial variations, are ignored and very difficult to detect.

Temporal inhibition is not the same thing as adaptation or depletion.It takes a little while for our eyes to adapt to changes in scene bright-ness. Part of this is the response of the iris to open or close the pupil,letting more or less light into the eye. Part of the adaptation responseis chemical, creating more amplification of dim light signals. The formerrequires many seconds and the latter many minutes to operate fully.

It is well established that staring at a fixed pattern or color target for abrief time will chemically deplete the rods or cones. Then looking at ablank page will produce an image of the negative or inverse of the orig-inal. Figure 34 shows a simple example. Stare fixedly at the center of thecircle for about 60 seconds, and then look away. Because the color sen-sitive cones have been depleted, the afterimage of a circle composed ofopposing colors will appear (green for red, yellow for blue, and so on).

Article

Figure 34. A target to demonstrate adaptation. Stare at the central cross forabout a minute and then look away toward a blank sheet of paper. The com-plementary colors will be seen.

2 Vol. 39/3 Proceedings RMS September 2004

Motion sensing is obviously important. It alerts us tochanges in our environment that may representthreats or opportunities. And the ability to extrapo-late motion lies at the heart of tracking capabilitiesthat enable us to perform actions such as catching athrown ball. Tracking and extrapolation present asecond-order opportunity to notice discontinuities ofmotion. If a moving feature suddenly changes speedor direction, that is also noticed. It is because of thisability to track motion and notice even subtle changesthat the presentation of data as graphs is so useful,and why plots of the derivatives of raw data oftenreveal information visually.

Sequences of images are interpreted very differentlyif they occur at a sufficiently slow rate that they areseen as a sequence of individual pictures, or faster sothat they present the illusion of continuous motion.Much of the blame for motion pictures and televisioncan be assigned to Eadweard Muybridge, a photogra-pher who was asked to settle a bet for Leland Stanfordas to whether a galloping horse ever had all four feetoff the ground. Muybridge set up a row of cameraswith trip wires to photograph a horse as it gallopedpast, producing a sequence of pictures. Viewing themindividually was enough to settle the bet, butMuybridge discovered that flipping through themrapidly gave the visual illusion of continuous motion.The movie industry began almost immediately.Figure 35 shows a sequence of twelve images of a trot-ting horse, taken by Muybridge.

In viewing a series of images, an important phenom-enon called aliasing can occur. Generally we assumethat a feature in one frame will correspond to a fea-ture in the next if they are nearly the same in colorand shape, and if they are close together. The familiarreversal of direction of the spokes on a turning wheelis an example of the visual error that results when afeature in one frame is matched to a different one inthe next (Figure 36). From observing this phenome-non, it is a short step to stroboscopic imaging in whicha series of pictures is taken at time intervals that

match, or nearly match, the repetition rate of somephenomenon. This is commonly used to study turn-ing or vibrating objects, falling water drops, and soon. Each image shows almost the same thing, albeitwith a different tooth on the gear or a different waterdrop as the subject. We visually assume they are thesame and perceive no change. By slightly varying thetiming of the images (typically by controlling the rateat which the light source flashes) it is possible toextract subtle patterns and motions that occur withinthe faster repetitions.

If an image sequence is slower than about 15 framesper second, we don’t perceive it as continuous. In fact,flickering images at 10-12 times a second can eveninduce epileptic fits in those with that affliction. But athigher rates, the temporal response of the eye andvision system sees continuous action. Movies are typi-cally recorded at 24 frames per second, and televisionbroadcasts either 25 (in Europe) or 30 (in the US)frames per second. In all of these cases, we interpretthe sequential images as continuous. Incidentally, thatis one of the problems with most video surveillancesystems in use now. In order to record a full day’sactivity on a single VHS tape, the images are notstored at 30 frames per second but instead singleframes (actually, single fields with only half the verti-cal resolution) are stored about every 4.5 seconds.The result is a series of still images from which someimportant clues are missing. We recognize people notonly by still images but also by how they move, andthe trajectories of motion are missing from therecording. We might detect posture, but not motion.

Just as human vision is best able to detect features orother causes of brightness or color variation over a

Figure 35. Muybridge’s horse sequence((this Quicktime movie can be downloaded fromhttp://DrJohnRuss.com/images/Seeing/Fig_35.mov)

Figure 36. The wagon wheel effect((this Quicktime movie can be downloaded fromhttp://DrJohnRuss.com/images/Seeing/Fig_36.mov)

Vol. 39/3 Proceedings RMS September 2004 3

relatively narrow range of sizes, so it deals best withevents that occur over a relatively narrow range oftimes. Very short duration events are invisible to oureyes without devices such as high speed photographyto capture them (Figure 37). Both high speed imag-ing and stroboscopic imaging techniques were pio-neered by Doc Edgerton, at MIT.

Likewise, events that take a long time (minutes) tohappen are not easy to examine visually to detectchanges. Even side-by-side images of the clock face inFigure 38 don’t reveal all ot the differences, and whenone of the images must e recalled from memory theresults are even poorer. Capturing the images in acomputer and calculating the difference shows clear-ly the motion of the minute hand and even the veryslight shift of the hour hand.

Considering the sensitivity of vision to motion, it isastonishing that our eyes are actually in nearly constantmotion. It is only in the Fovea that high resolution view-ing is possible, and we rotate our eyes in their sockets sothat this small high resolution region can view manyindividual locations in a scene, one after another.Flicking through the points in a scene that seem “inter-esting” gathers the information that our minds requirefor interpretation and judgment. Most of the scene isnever actually examined, unless the presence of edges,lines, colors, or abrupt changes in color, brightness, tex-ture or orientation make locations interesting.

Somehow, as our eyes move and different images fallonto the retina every few hundred milliseconds, ourminds sort it all out and plug the information into theperceptual scene that is constructed in our head.Although the entire image shifts on the retina, it isonly relative motion within the scene that is noticed.This motion of the eye also serves to fill in the blindspot, the location on the retina where the connectionof the optic nerve occurs, and where there are nolight sensors. Experiments that slowly move a lightthrough the visual field while the eyes remain fixedon a single point easily demonstrate that this blindspot exists, but it is never noticed in real viewingbecause eye motion picks up enough information tofill the perceptual model of the scene that we actuallyinterpret. But there may be a great deal of informa-tion in the actual scene that we do not notice. Clearlythere is a lot of interpretation going on.

The perceived image of a scene has very little in com-mon with a photographic recording of the samescene. The latter exists as a permanent record andeach part of it can be examined in detail at a latertime. The mental image of the scene is transitory, andmuch of it was filled with low resolution informationfrom our visual periphery, or was simply assumedfrom memory of other similar scenes. Scientists needto record images rather than just view them, becauseit is often not obvious until much later which are thereally important features present (or absent).

The evolution of the structure and function of the eyeand the brain connections that process the raw datahave been a response to environment and challengesof survival. Different animals clearly have differentneeds and this has resulted in different types of eyes,as noted above, and also different kinds of processing.Apparently the very few but extremely specific bits ofinformation that the fly’s eye sends to the fly’s brain areenough to trigger appropriate and successful responses(for example, a looming surface triggers a landing reflexin which the fly re-orient’s its body so the legs touch first).But for most of the “higher” animals the types of informa-tion gathered by the visual system and the interpretationsthat are automatically applied are much more diverse.

Figure 37. Doc Edgerton’s high speed photograph of a milkdrop creating a splash.

Figure 38. Pictures taken slightly over a minute apart of the clock on my office wall, and the difference between them.

4 Vol. 39/3 Proceedings RMS September 2004

For example, when people see a pencil placed in aglass of water, the pencil appears to be bent at anangle (Figure 39). Intellectually we know this is due tothe difference in the refractive indices of water andair, but the view presented to our brain still has thebend. South Pacific islanders who have spent theirlives fishing with spears in shallow water learn how tocompensate for the bend but the image that their eyespresent to their brains still includes the optical effect.There is evidence that fishing birds like the heronhave a correction for this offset built in to their visualsystem and that from the point of view of the per-ceived image they strike directly toward the fish inorder to catch it. Conversely, there are species of fishthat see bugs above the water and spit water at themto knock them down and catch them for food. Thisrequires a similar correction for the optics.

It may seem strange that computation in the braincould distort an image in exactly the proper way tocorrect for a purely optical effect such as the index ofrefraction of light in water. But this is different only indegree from the adjustments that we have alreadydescribed, in which human vision corrects for shadingof surfaces and illumination that can vary in intensityand color. The process combines the raw visual datawith a lot of “what we know about the world” in orderto create a coherent and usually accurate, or at leastaccurate enough, depiction of the scene.

Humans are very good at tracking moving objectsand predicting their path taking into account airresistance and gravity (for instance, an outfieldercatching a fly ball). Most animals do not have this abil-ity but a few have learned it. My dog chases a thrownball by always running toward it’s present position,which produces a path that is mathematically a trac-trix. It works, it doesn’t take much computation, butit isn’t optimum. My cat, on the other hand, runs in astraight line toward where the ball is going to be. Shehas solved the math of a parabola, but havingpounced on the ball, she won’t bring it back to me, asthe dog will.

What we typically describe in humans as “eye-hand”coordination involves an enormous amount of subtlecomputation about what is happening in the visualscene. A professional baseball or tennis player whocan track a ball moving at over 100 miles per hourwell enough to connect with it and hit it in a con-trolled fashion has great reflexes, but it starts withgreat visual acuity and processing. It was mentionedearlier that the human eye has some 150+ millionlight sensors, and for each of them some 25-50,000processing neurons are at work extracting lots ofinformation that evolution has decided we can use tobetter survive.

The third dimension

Most people have two functioning eyes, and have atleast a rudimentary idea that somehow two eyes allowstereoscopic vision that provides information on thedistance of objects that we see. Lots of animals haveeyes located on the sides of their heads where theoverlap in the two fields of view is minimal, and don’trely on stereo vision very much. It is mostly predatorsand particularly animals that live by leaping about intrees that have their eyes forward facing, indicatingthat stereoscopic vision is important. But it is by nomeans the only way by which distances are deter-mined, and it isn’t a particularly quantitative tool.

Humans use stereo vision by rotating the eyes in theirsockets to bring the same feature to the fovea in eacheye (as judged by matching that takes place in thevisual cortex). It is the feedback from the muscles tothe brain that tell us whether one feature is closerthan another, depending on whether the eyes had torotate in or out as we directed our attention from thefirst feature to the second. Notice that this is not ameasurement of how much farther or closer the fea-ture is, and that is works only for comparing onepoint to another. It is only by glancing around thescene and building up a large number of two-pointcomparisons that our brain constructs a map of therelative distances of many locations.

Figure 39. The apparent bend in the pencil and its magni-fication are familiar optical effects.

Vol. 39/3 Proceedings RMS September 2004 5

Stereoscopy can become confused if there are severalsimilar features in the image, so that multiple match-es (corresponding to different apparent distances).This happens rarely in natural scenes, but can createproblems in microscopy of repetitive structures.

One of the fascinating discoveries about stereopsis,the ability to fuse stereo pair images by matchingdetails from the left and right eye views, is the ran-dom-dot stereogram (Figure 40). Bela Julesz showedthat the visual system was able to match patterns ofdots that to a single eye appeared chaotic and withoutstructure, to form stereo images. Slight lateral dis-placements of the dots are interpreted as parallax andproduce depth information.

Sequential images produced by moving a single eyecan also produce stereoscopic depth information. Therelative sideways motion of features if we shift ourhead from side to side is proportional to distance.One theory holds that snakes, whose eyes are not wellpositioned for stereo vision, move their heads fromside to side to better triangulate the distance to strike.

Stereoscopy only works for things that are fairly close.At distances beyond about 100 feet, the angular dif-

ferences become too small to notice. Furthermore,there are plenty of people who, for one reason oranother, do not have stereo vision (for example, thisis a typical consequence of childhood amblyopia, orlazy eye), who still function quite well in a three-dimensional world, drive cars, play golf, and so on.There are several other cues in images that are usedto judge distance.

If one object obscures part of another, it must be clos-er to our eye. Precedence seems like an absolute wayto decide the distance order of objects, at least thosethat lie along the same line of sight. But there built-inassumptions of recognition and simple shape ofobjects, as shown in the example in Figure 41, whoseviolations create an incorrect interpretation.

Relative size also plays an important role in judgingdistance. Many of the things we recognize in familiarscenes have sizes - most typically heights - that fall intonarrow ranges. Closely related is our understandingof the rules of perspective - parallel lines appear toconverge as they recede (Figure 42). In fact, the pres-ence of such lines is interpreted visually as corre-sponding to a surface that extends away from theviewer, and irregularities in the straightness of the

Figure 40. Random dot stereogram showing a torus on a plane (stare and the image and the eyes will pick outthe matching patterns and fuse them into an image).

6 Vol. 39/3 Proceedings RMS September 2004

lines are interpreted as representing bumpsor dips on the perceived surface. Drivingthrough the countryside looking at plowedfields provides a simple example.

By comparing the apparent size of featuresin our visual field we judge the relative dis-tance of the objects and of things thatappear to be close to them. But again, theunderlying assumptions are vital to success,and violations of the straightness of align-ments of features or the constancy of sizesproduces illusory interpretations (Figure 43).

A simple extension of the assumption of sizeconstancy for major features uses the sizes ofmarks or features on surfaces to estimatedistance and angle. It is logical, and indeedoften correct, to assume that the marks ortexture present on a surface are randomand isotropic, and that visual changes inapparent size or aspect ratio indicate differ-ences in distance or orientation (Figure 44).Once again, violations of the underlyingassumptions lead us to the wrong conclu-sions.

There are other clues that may be present inreal scenes, although they are less relevantto the viewing of images from microscopesor in the laboratory. For instance, atmos-pheric haze makes distant features appearmore blue (or brown, if the haze is smog)and less sharp. Renaissance painters, whomastered all of these clues, representedatmospheric haze in scenes along with cor-

Figure 41. Left: Obviously the blue square is in front of the red circle. Right: But it may not be a circle, andviewed from another angle we see that the red feature is actually in front of the blue one.

Figure 42. Converging lines are interpreted as parallel lines that convergeaccording to the rules of perspective, and so the surface is perceived asreceding from the viewer. Straight lines imply a flat surface (a), while irreg-ularities are interpreted as bumps or dips in the perceived surface (b).

(a)↑ ↓(b)

Vol. 39/3 Proceedings RMS September 2004 7

rect geometric perspective. Working from a knowngeometry to a realistic representation is a very differ-ent task than trying to extract geometric informationfrom a scene whose components are only partiallyknown, however.

How versus What

Several very plausible models have been put forwardfor the algorithms functioning in the eye and otherportions of the visual pathway. The first few layers ofneurons in the retina are connected in ways that canaccount for mechanisms behind local and temporalinhibition, and the interleaving of information fromright and left eyes in the visual cortex is consistentwith the fusion of two images for stereopsis. Suchmodels serve several purposes - they can be tested byphysiological probes and external stimuli, and theyform the basis for computer techniques that attemptto extract the same information from images. In fact,they serve the latter purpose even if they turn out tobe failures as actual descriptions of the functioning ofthe neurons. But while they may be effective atdescribing HOW at least some parts of the visual sys-tem work, because they work at the level of bits andpieces of the image and not the Gestalt or informationlevel, they don’t tell us much about WHAT we see.

Several years ago I was retained as an expert witnessin a major criminal trial. The issue at hand waswhether a surveillance video tape from the scene of amurder was useful for the identification of the sus-pects. The images from the tape had been extensive-ly computer-enhanced and were shown to the jury,who were invited to conclude that the current appear-ance of the defendants couldn’t be distinguished fromthose images. In fact, the images were so poor in bothspatial and tonal resolution that they couldn’t be dis-tinguished from a significant percentage of the popu-lation of the city in which the murders took place, andit was the job of the defense to remind the jury thatthe proper question was whether the pictures con-tained enough matching information to identify thedefendants “beyond a reasonable doubt.” It was veryinteresting in this case that none of the numerous wit-nesses to the crime were able to pick any of the defen-dants out from a lineup. The human eye is indis-putably a higher resolution, more sensitive imagingdevice than a cheap black and white surveillancevideo camera. But for a variety of reasons the humanspresent could not identify the perpetrators.

In that trial I was accepted by the court as an expertboth in the field of computer-based image processing(to comment on the procedures that had been appliedto the camera images) and on human perception (tocomment on what the people present might havebeen able to see, or not). The point was raised that mydegrees and background are not in the field of phys-

Figure 43. In these illustrations, the expectation of distanceis established by assuming the pink posts are constant in sizeand arranged in straight parallel lines, whose apparent con-vergence is a function of perspective. In the bottom illustra-tion, the appearance of the green boxes is consistent withthis interpretation. In the top illustration it is not, and wemust either conclude that the boxes differ in size or theposts do not conform to our expectation.

Figure 44. Rocks on a beach. Assuming that the rocks aresimilar in size and round on the average informs us of theviewing angle and the distance to farther locations.

8 Vol. 39/3 Proceedings RMS September 2004

iology, but rather physics and engineering. Howcould I comment as an expert on the processes ofhuman vision? The point was made (and accepted bythe court) that it was not an issue of how the humanvisual system worked, at the level of rhodopsin orneurons, that mattered, but rather of what informa-tion human vision is capable of extracting from ascene. I do understand what can be seen in images,because I’ve spent a lifetime trying to find ways forcomputers to extract some of the same information(using what are almost certainly very different algo-rithms). Accomplishing that goal, by the way, willprobably require a few more lifetimes of effort.

In fact, there has often been confusion over the dif-ference between the How and the What of humanvision, often further complicated by questions of Why.In describing a computer algorithm for some aspectof image analysis, the explanation of the steps bywhich information is extracted (the How) is intimate-ly bound up in the computed result (the What). Butin fact, the algorithm may not be (in fact usually isnot) the only way that information can be obtained.Many of the important steps in image analysis haveseveral more-or-less equivalent ways of extract-ing the same result, and moreover each of themcan typically be programmed in quite a few dif-ferent ways to take advantage of the peculiari-ties of different computer architectures. And, ofcourse, no one claims that any of those imple-mentations is identical to the processing carriedout by the neurons in the brain.

David Marr, in his final book “Vision”(Freeman, 1982) has pointed out very forceful-ly and eloquently that confusing the How andthe What had led many researchers, includinghimself, into some swampy terrain and deadends in the quest for an understanding ofvision (both human and animal). Mapping thetiny electrical signals in various parts of thevisual cortex as a function of stimuli presentedto the eye, or measuring the spectral responseof individual rod or cone cells, is certainly animportant part of eventually understanding theHow of the visual system. And it is an experi-ment that is performed at least in part becauseit can be done, but it isn’t clear that it tells usvery much about the What. On the other hand,tests that measure the response of the frog’s eyeto a small dark moving target invite speculationabout the Why (to detect a moving insect - food).

Researchers have performed many experi-ments to determine what people see, usuallyinvolving the presentation of artificial stimuli ina carefully controlled setting and comparingthe responses as small changes are introduced.This has produced some useful and interesting

results, but falls short of addressing the problem ofvisual interpretation of scenes. The key is not just thatpeople can detect (“see”) a certain stimulus, but thatthey can interpret its meaning in a complex scene. Itmight be better to substitute the word “interpret” for“see” to emphasize that the individual cues in imagesare only important for understanding the world whenthey are combined and processed to become a seman-tic representation. In other words, we do have to turnthat picture into its “thousand word” equivalent. Forthat purpose, it is often more revealing to study theerrors that humans make in looking at whole images.This includes, but is not limited to, various kinds ofvisual illusions.

There are also important clues in what artists haveportrayed (or left out) of representational paintingsand even cartoons. By exaggerating a few selectedfeatures into a caricature, for example, editorial car-toonists create very distorted but extremely recogniz-able representations of familiar political figures. Formany people, such cartoons may represent a greatertruth about the person than an actual photograph(Figure 45).

Figure 45. Richard Nixon’s ski nose, dark eyebrows and shady eyes,receding hairline and 5 o-clock shadowed jowls were used by cartoon-ists to create an instantly recognizable caricature.

Seeing what isn’t there, and vice versa

Vol. 39/3 Proceedings RMS September 2004 9

Figure 46. It takes very few cues to trigger the recognition of a face: a) the ubiquitous happy face; b) the “face on Mars” whichappears only if the viewing angle and lighting are correct.

Figure 47. Altering the ratios of dimensions (such as the horizontal distance between the eyes, ears, width of mouth, etc., orvertical dimensions such as length of nose, distance from mouth to chin, height of forehead, etc.) strongly affects our abilityto recognize faces.

One problem that plagues eyewitness testimony andidentification is that we tend to see (i.e., pick out froma scene) things that are familiar (i.e., already havemental labels). One facility that is hard-wired into ourbrains, just like the ability of the frog to spot a bug, isfinding faces. Babies find and track faces from birth.We are so good at it that even with just a few clues,like two eyes and a mouth, we see a face, whether it isreal or not. The ubiquitous “smiley face” cartoon hasenough information to be recognized as a face. Sodoes a mountain on Mars, when illuminated andviewed a certain way (Figure 46).

But to recognize a particular face, for instance asgrandmother, we need a lot more clues. Computer

programs that perform facial recognition use ratios ofdimensions, for instance the ratio of the distancebetween the eyes to that between the tips of the ears,or the distance between the mouth and chin to thedistance from the tip of the nose to the chin. Theadvantage of ratios is that they are insensitive to thesize of the image, and to a considerable extent to ori-entation or point of view. But that is an algorithm andso addresses the How rather than the What. It seemslikely that human facial recognition uses more or dif-ferent clues, but certainly altering those proportionsby even a few percent changes a face so that itbecomes unrecognizable (Figure 47).

10 Vol. 39/3 Proceedings RMS September 2004

Police artists routinely produce sketches from eyewit-ness descriptions. Comparing these pictures to actualphotographs of perpetrators after capture suggeststhat only a few characteristics of the face are likely tobe noticed, and turned into a mental caricature ratherthan an actual representation (see the examples inFigure 48). And differences in race between witnessand perpetrator make it especially difficult to pick outthose characteristics that are likely to identify the per-son. We learn to pick out the particular kinds ofdetails that are most useful in identifying those famil-iar to us, and these are not very helpful for otherraces (“they all look alike”).

Finally, when something or someone is recognized(rightly or wrongly) in an image, our minds mentallyendow the semantic representation of that person orobject with the full set of characteristics that weremember from past encounters. A typical examplewould be seeing a friend’s face from the side, but“knowing” that there is a mole on the other cheek andbelieving we had seen that this time as well. Thatleads to a lot of eyewitness problems. If a witnessthinks they have recognized someone or something,they will often testify with confidence and honestythat they have seen things that were actually not pre-sent. This can include even highly specific items likearticles of clothing, tattoos, etc. One witness was surethat a car in a hit and run case had a particularbumper sticker on the back, when in fact she had not

been in a position to see the back of the car, becauseshe was familiar with a car of the same model andcolor that did have such a bumper sticker.

We all do this, unconsciously. When you pass a neigh-bor’s house, if you glimpse someone mowing the lawnand you “expect” it to be the teenage son, you arelikely to “see” details of his appearance, haircut, cloth-ing, etc., that may not be visible, or may not even bethere - it might not even be the right person. Usuallythis process is helpful because it gives us a sense ofplace and situation that is most often correct withoutrequiring additional time or effort. A few mistakes willbe made, but fortunately they aren’t usually seriousones and the consequences are rarely more thanmomentary embarrassment. Many decades ago whenmy eldest son was a preschooler, I shaved off a mus-tache that I had worn since before his birth. He didnot notice for two days, until it was pointed out to him(and then he was upset at the change).

Seeing what we “know” is present, or at least expectto be present, is common. A colleague of mine, whofor years helped in teaching courses on image analy-sis, has a favorite picture of herself holding a dearlyloved pet, now deceased. Unfortunately the dog isblack, she is wearing a black sweater, and the photo-graphic print is very dark (and the negative, whichwould have a greater dynamic range, is not available).She has challenged us for years to process that image

Figure 48. Examples of police artist sketches and photos of the actual persons.

Vol. 39/3 Proceedings RMS September 2004 11

to show the dog that she sees when she looks at thepicture, but we’ve failed because there is just nothingthere in terms of the pixel values - they are all thesame shade of near black. One of my students took acopy of the scanned print and painstakingly drew in aplausible outline of the correct breed of dog. Herimmediate response was “That’s not the right dog!”She has a stored image in her mind that containsinformation not available to anyone else who looks atthe image, and she believes she can see that informa-tion when she looks at the picture. Certainly her mindsees it, if not her eyes.

The extension of this process to scientific imageanalysis is obvious and should be of great concern. Wesee what we expect to see (things for which we haveexisting mental labels), fail to see or recognize thingsthat are unfamiliar, misjudge things for which we donot have an appropriate set of stored clues, and trulybelieve that we have seen characteristics in one imagethat have been seen in other instances that areremembered as being similar. That’s what it means tobe human, and those are the tendencies that a carefulscientific observer must combat in analyzing images.

Image compression

There are some important lessons about humanvision to be found in the rapid acceptance of digitalstill and video cameras. All consumer cameras andmany higher end cameras store images in a com-pressed format because memory is expensive and alsosmaller files can be saved more quickly (more thanenough improvement to make up for the time need-ed to carry out the compression). People seem willingto pay for high resolution multi-megapixel camerasand then try to compress the image by a factor of 10,20 or more to produce a file that can be transmittedefficiently over the internet.

Compression techniques such as MPEG for video andJPEG for still pictures are widely used and little ques-tioned. In addition to MPEG (Moving Pictures ExpertGroup) compression, a variety of codecs (compressor-decompressor) are available for Apple’s Quicktimeand Macromedia’s Flash software. The original JPEG(Joint Photographers Expert Group) technique usinga discrete cosine transform has been joined by waveletand fractal methods.

All of these methods achieve compression by leavingout some of the information in the original image;technically they are “lossy” compression techniques.The intent of the compression is to preserve enoughinformation to enable people to recognize familiarobjects. Most of the techniques depend to some extenton the characteristics of human vision to decide whatshould be kept and what can be modified or left out.Some, like fractal compression, replace the actual

details with other detail “borrowed” from elsewherein the image, on the theory that any fine detail willfool the eye.

Compression discards what people don’t easily see inimages. Human vision is sensitive to abrupt localchanges in brightness, which correspond to edges.These are kept although they may shift slightly inlocation, and in magnitude. On the other hand,absolute brightness is not visually perceived so it is notpreserved. Since changes in brightness of less than afew percent are practically invisible, and even largervariations cannot be seen if they occur gradually overa distance in the image, compression can eliminatesuch details.

Color information is reduced in resolution becauseboundaries are primarily defined by changes inbrightness. The first step in most compressionschemes is to reduce the amount of color information,either by averaging it over several neighboring pixelsor by reducing the number of colors used in theimage, or both. Furthermore, color perception is notthe same in all parts of the visible spectrum. We can-not discriminate small changes in the green range aswell as we can other colors. Also, gradual changes incolor, like those in brightness, are largely invisible,only sharp steps are noticed. So the reduction of colorvalues in the image can be quite significant withoutbeing noticeable.

It is also possible to reduce the size of video or moviefiles by finding regions in the image that do notchange, or do not change rapidly or very much. Insome cases, the background behind a moving objectcan be simplified, even blurred, while the foregroundfeature can be compressed because we don’t expect tosee fine detail on a moving object. Prediction of thelocations in an image that will attract the eye (some-times called “interesting points” and usually associat-ed with high local contrast, or familiar subjects -where do your eyes linger when looking at a pictureof a movie star? what parts of the image don’t younotice?) and cause it to linger in just a few areas allowsother areas to be even further compressed.

Certainly it can be argued that this type of compres-sion works below the threshold of visual discrimina-tion most of the time, and does not prevent peoplefrom recognizing familiar objects. But that is exactlythe problem: compression works because enoughinformation remains to apply labels to features in theimage, and those labels in turn cause our memories tosupply the details that are no longer present in thepicture. The reason for recording images in scientificstudies is not to keep remembrances of familiarobjects and scenes, but to record the unfamiliar. If itis not possible to know beforehand what details mayturn out to be important, it is not wise to discard

12 Vol. 39/3 Proceedings RMS September 2004

them. And if measurement of features is contemplat-ed (to measure size, shape, position or color informa-tion), then lossy compression, which alters all of thosevalues, must be avoided.

It is not the point of this section to just make therather obvious case that compression of digital imagesis extremely unwise and should be avoided in scien-tific imagery. Rather, it is to shed illumination on thefact that compression is only acceptable for snapshotsbecause human vision does not notice or dependupon very much of the actual contents of an image.Recognition requires only a few clues, and ignoresmuch of the fine detail.

A world of light

Our eyes are only one part of the overall systeminvolved in producing the sensory input to the visualcortex. It is easy to overlook the importance of thelight source and its color, location, and brightness. Afew concerns are obvious. When shopping for clothes,furniture, or other items, it is best not to rely on theirappearance under the artificial lighting in the store,but to see how the colors appear in sunlight. The dif-ference in color temperature of the light source (sun-light, incandescent lighting, fluorescent lighting) canproduce enormous differences in the visual appear-ance of colors. And don’t even think about trying toguess at colors using the illumination from a sodiumstreet light, which is essentially monochromatic andprovides no clues to color at all.

In a laboratory setting, such as the use of a lightmicroscope, color judgments can be similarly affectedby small variations in the color temperature of thebulb, which depends very sensitively on the voltageapplied to it (and also tends to change significantlyover the first few and last few hours of use as the fila-ment and its surface undergo physical alterations).Simply reducing the illumination (e.g., to take aphoto) by turning down the voltage will change thecolors in the image. This happens whether we areimaging the light that is transmitted (not reflected orabsorbed) through a thin sample, as in the transmis-sion light microscope, or the light reflected from asurface, as in macroscopic imaging. But generally it isthe latter case that our brains are prepared to inter-pret.

For real world scenes, the light source may be a singlepoint (e.g., sunlight or a single bulb), or there may bemultiple sources. the source may be highly localizedor it may be extended. Lighting may be direct or indi-rect, meaning that it may have been reflected or scat-tered from other surfaces between the time it leavesthe source and reaches the object. All of these vari-ables affect the way the object will appear in the finalimage.

The surface of the object also matters, of course. Mostof the light that is not transmitted through orabsorbed within the object is scattered from anextremely thin layer just at the surface of the object.For a perfect metal, this happens exactly at the sur-face, but most materials allow the light to penetrate atleast a short distance beneath the surface. It is thevariation of absorption within this thin layer for dif-ferent light wavelengths, and the variation of pene-tration with wavelength, that gives an object color. Forinstance, preferential absorption of green light willcause an object to appear purple. Ideal metals, forwhich there is no light penetration, have no color (thecolors of gold, copper and silver result from a verycomplex electronic structure that actually allowsslight penetration).

The fraction of the incident light that is reflected orscattered is measured by the surface albedo. A verydark object may absorb as much as 90% of the inci-dent light, whereas a very bright one may absorb onlya few percent. The interaction of the light with theobject typically includes a mixture of diffuse and spec-ular reflection. The diffuse component sends light inall directions, more of less following a cosine patternas shown in Figure 49. The specular componentsends light in the particular direction of mirror reflec-tion with an angle to the local surface normal equal tothe incident angle. The specularity of the surface isdefined by the fraction of the light that reflects at themirror angle and the narrowness of the reflectedbeam.

Computer programs that generate rendered surfaceimages from measurements and shape informationuse models that correspond to the behavior of typicalmaterials. As shown in the Figure 50, it is possible tochange the appearance of the surface and our judg-ment of its composition by altering the specularity.

Incident

LightDiffuse

Scatter

Specular

Reflection

Figure 49. The relative amount of diffuse scattering andspecular reflection, the angular breadth of the specularreflection, and the total amount of light that is scatteredrather then transmitted or absorbed, all of which may varywith wavelength, determine the appearance of surfaces.

Vol. 39/3 Proceedings RMS September 2004 13

From a series of images with different light sourcelocations it is possible to interpret the geometry of theobject from its appearance. We do this automatically,because our brains have evolved in a world that pro-vides many opportunities to learn about the effect oftilting objects on their appearance, and the effect ofcoating a surface with different materials.

Changing the appearance of a surface from one mate-rial to another, or altering the light source color orposition, can help us to notice important details on anobject. Partly this is due to enhancement of the reflec-tions from particular features, and partly to violatingthe expectations we normally have when viewing asurface and consequently forcing our attention to allof the details in the image. There is a powerfuldemonstration of this at <http://www.hpl.hp.com/news/2000/oct-dec/3dimag-ing_files/tablet_specular_VR.html>

where a surface imaging technique developed by TomMalzbender at Hewlett Packard Labs is shown. Aseries of images taken with a single, stationary camerabut with lighting from many different (known) orien-tations is used to compute the orientation and albedoof the surface at each location on an object. This dataset is then used to render an image of the surface withany characteristics, including those of an ideal metal,while the viewer interactively moves the light sourceposition. The example in Figure 51 shows the recov-ery of fine details from an ancient clay tablet.

The technique that underlies this calculation is called“shape from shading” or “photometric stereo.”Instead of taking two or more pictures from differentviewpoints, as in stereoscopy, photometric stereo usesmultiple images from the same viewpoint but with dif-ferent illuminations. Shape from shading uses theknown distributions of diffuse and specular reflec-

Figure 50. Range image of a coin (produced by a scanned stylus microscope), and three renderings using Phong shading withdifferent specularity and incident light positions.

14 Vol. 39/3 Proceedings RMS September 2004

tions for a particular type of surface to estimatechanges in the local slope of the surface with respectto the lines of sight from the light source and theviewpoint. The weakness of the shape-from-shadingapproach is that it deals only with differences in inten-sity (and hence in slope). Determining the actual sur-face elevation at each point requires integrating theseslope values, with an unknown constant of integra-tion. Nevertheless, the method has numerous appli-cations, and also serves to illustrate a computationthat our minds have been trained by years of practicalobservations to make automatically.

The mental shortcuts that enable us to interpretbrightness variations as shape are convenient andoften correct (or at least correct enough for purposessuch as recognition and range-finding). But they areeasily fooled. Surface brightness can change for rea-sons other than geometry, such as the effect of inten-tional or unintentional coatings on the surface (e.g.,oxidation, stains). There may also be nonuniformities

in the illumination, such as shadows on the surface. Ifthese are not recognized and compensated for, they willinfluence our judgment about the surface geometry.

One thing that we are conditioned to expect fromreal-world viewing is that lighting comes from above,whether it is the sun in the sky or lights above ourdesk surface. If that expectation is violated, our built-in shape from shading calculation reaches the wrongconclusion and interprets peaks and pits and viceversa (Figure 52). Such illusions are amusing when werecognize them, but sometimes we may remainfooled.

Images that come from novel modalities, such as thescanning electron microscope, appear to be familiarand readily interpretable because the brightness ofsurfaces varies with slope, just as in the true shapefrom shading situation. But different physicalprocesses are involved, the mathematical relation-ships between brightness and geometry are not quite

Figure 51. Images of an ancient clay tablet with different illumination positions (top), and rendered as a metallic surface toreveal subtle detail (bottom).

Vol. 39/3 Proceedings RMS September 2004 15

the same, and misinterpretations can occur. For onething, the appearance of edges and fine protrusions isvery bright in the SEM, which does not occur in nor-mal light scattering from surfaces (Figure 53).

Many other types of images, such as the surface mapsproduced by the AFM based on various tip-sampleinteractions, are commonly presented to the viewer asrendered surface representations. These are typically

generated using the strength of a measured signal asa measure of actual surface geometry, which it maynot be. Electronic or chemical effects become “visible”as though they were physical elevations or depres-sions of the surface. This is an aid to “visualization” ofthe effects, taking advantage of our ability to interpretsurface images, but it is important (and sometimes dif-ficult) to remember that it isn’t really geometry that isrepresented but some other, more abstract property.

Figure 52. Rotating the same image (of cuneiform indentations in a clay tablet) by 180 degrees makes the pits appear to bepeaks.

Figure 53. SEM image of particulates on a surface. Steeply inclined surfaces and edges appear bright.

John Russ is the author of TheImage Processing Handbook,Computer Assisted Microscopy,Practical Stereology, ForensicUses of Digital Imaging, ImageAnalysis of Food Structure, aswell as many other books andpapers. He has been involved inthe use of a wide variety ofmicroscopy techniques and thecomputerized analysis ofmicrostructural images for near-ly 50 years. One of the originalfounders of Edax International(manufacturer of X-ray analyti-cal systems), and the pastResearch Director of RankTaylor Hobson (manufacturer ofprecision metrology instru-ments), he has been since 1979 aprofessor in the MaterialsScience department at NorthCarolina State University. Nowretired, he continues to writeand lecture on topics related toimage analysis.

“The first section of this three-part paper has emphasized thedependence of vision on localcomparisons of brightness, color,orientation, and feature relation-ships. In the next part, compar-isons over time are included tointerpret motion, and compar-isons over longer ranges areshown to influence judgments of distance. In addition, the ten-dency of people to see only a fewthings in a scene, and to see whatthey expect to see in a given con-text, is illustrated.

The third and concluding partdeals with object recognition”

Article

Vol. 39/4 Proceedings RMS December 2004 1

Seeing the Scientific Image, Part 3John C. Russ

Materials Science and Engineering Department,North Carolina State University, Raleigh, NC

Size matters

The size of features is determined by the location of the feature bound-aries. The only problem with that rather obvious statement is decidingwhere the boundary lies. Human vision works by finding many typesof lines in images, which include edges of features, based on locatingplaces where brightness or color changes abruptly. Those lines aretreated as a sketch (called the “primal sketch”). Cartoons work becausethe drawn lines substitute directly for the edges that would be extract-ed from a real scene.

Using a computer program to extract edge lines (Figure 54) illustratesthe idea of the sketch. The computer program finds the location (to thenearest pixel) where the maximum change in brightness occurs. Thesketch extracted by the retina isn’t quite the same. For one thing, grad-ual changes in brightness are not as visible as abrupt changes, and thechange must be at least several percent to be noticed at all. Changes incolor are not as precisely located, and some color changes are muchmore noticeable than others (variations in the green part of the spec-trum are noticed least).

Furthermore, people do not interpret the edge line in a consistent way.A simple demonstration can be found in the way we cut out patterns, andthere is some indication that there is a sex-linked behavior involved. Girlscutting cloth to make a garment tend to cut outside the line (better seamstoo wide than too narrow); boys cutting out model airplane parts tend tocut inside the line (so parts will fit together). The same habits carry overto tracing features for computer measurement. A trivial difference, per-haps, but it raises the interesting question “Is the edge a part of the fea-ture or a part of the surroundings?” In many cases, that depends onwhether the feature is dark on a light background (in which case theedge is likely to be seen as part of the feature) or the converse.

Article

Figure 54. Extraction of the edges from an image produces the primal sketchof the scene.

2 Vol. 39/4 Proceedings RMS December 2004

In many real images, the boundaries of features arenot uniform. Variations in brightness and contrastcause variation in judgment of the location of the fea-ture edge, and hence in its size and shape. In somecases, the boundary disappears in places (Figure 55).Human vision is not bothered by such gaps (althoughcomputer measurement certainly is). We fill in thegaps with simple, smooth curves that may or may notcorrespond to the actual shape of the feature.

Boundaries are certainly important, but there is evi-dence that features are represented conceptually notas a collection of boundaries but as a simplified mid-line. The pipe-cleaner animals shown in Figure 56 arerecognizable because we fill out the bodies from the“skeleton” shown. This is not the actual skeleton of

bones, of course, but the one used in computer-basedimage analysis, a set of midlines sometimes called themedial axis of the object. The topology of the skeleton(the number of branches, ends and loops) providescritical information for feature recognition byhumans and machines.

Whether the boundaries or the skeletons of featuresare used for representation, comparisons of the size offeatures are strongly affected by their shape, positionand brightness. A map of the continental UnitedStates illustrates this well (Figure 57). In order tocompare the sizes of two states we literally drag theimage of one onto the other, in our minds. Since theshapes are different, the fit is imperfect. How theparts that “stick out” are treated depends on their

perceived importance. For example, compar-ing Oklahoma to Missouri is tricky because thepanhandle of Oklahoma is pretty skinny andeasily overlooked (but Oklahoma is largerthan Missouri).

Florida is about the same size as Wisconsin,but they are different colors and far apart andcomparison is very difficult. Colorado has asimple rectangular shape, difficult to compareto Nevada or Oregon which are not so regularand tend to appear smaller. The greater verti-cal extent of North Dakota is visually impor-

Figure 55. TEM image of stained tissue. The membrane boundaries of the organelles are not visible in some locations (arrows)but human vision “knows” they continue and completes them with simple smooth curves.

Figure 56. Pipe-cleaner animals (elephant, kangaroo and dachshund)represent solid objects by their skeletons.

Vol. 39/4 Proceedings RMS December 2004 3

tant and leads to the erroneous conclusion that thestate is larger than South Dakota. Vertical extent isimportant in comparing Illinois to Iowa and NewYork, as well, as are the differences in shape and color(and the fact that New York is far away).

Visual judgments of size are very error prone underthe best of circumstances, and easily swayed by seem-

ingly minor factors, several of which have been illus-trated in these examples. Another very common mis-taken judgment of size involves the moon. Most peo-ple report that it appears to be larger by one-third toone-half when near the horizon that when high in thesky (Figure 58), probably because at the horizon thereare other structures which the eye can use for com-parison. Vertical extent is generally considered more

Figure 57. The continental United States.

Figure 58. Example of the increase in the visual impression of size of the moon when viewed near the horizon, as comparedto overhead.

4 Vol. 39/4 Proceedings RMS December 2004

important than horizontal extent (Figure 59).Features that contrast more with their surroundingsare generally considered to be larger than ones withless contrast. Departures from geometrically simpleshapes tend to be ignored in judging size.

Also, the context of the scene is also very important.In the discussion of stereoscopic vision and interpre-tation of the third dimension it was noted that expec-tation of constant size is one of the cues used to judgedistance. It works the other way, as well. We expectthe rules of perspective to apply, so it one feature ishigher in the scene than another, and is expected tobe resting on the ground, then it is probably fartheraway, and thus it should appear to be smaller. If it isactually the same size in the image, we would tend tojudge it as being larger in actuality. Unfortunately,when viewing images for which the rules of perspec-tive do not apply this “correction” can lead to thewrong conclusions.

Shape (whatever that means)

Shape is extremely important in visual recognition ofobjects. Even if the size or color of something ischanged radically (a miniature pink elephant, per-haps), the shape provides the important informationthat triggers identification. But what is shape? Thereare very few common adjectives in English or anyother language that really describe shape. We haveplenty for size and color, but few for shape. Instead,we describe shape by saying that something is shaped“like an elephant” - in other words we don’t describethe shape at all but simply refer to a representativeobject and hope that the listener has the same mentalimage or model that we do, and identifies the same

important shape features that we do. The few appar-ent exceptions to this - adjectives like “round” - actu-ally fall into this same category. Round means “like acircle” and probably everyone knows what a circlelooks like.

But what does it mean to depart from being roundlike a circle? Unfortunately there are lots of ways tobecome less like a circle. Figure 60 shows just two ofthem: one feature has been stretched horizontally butis still smooth while the other has remained equiaxedbut with a rippled edge. Many other variations withjagged edges, stretching in more or other directions,and so forth are possible. Which of these featuresshould be considered “rounder?”

When computers measure features in images, thereare a lot of mathematical ways that they can describeshape. The most commonly used are simple dimen-sionless ratios of size measurements. There are a lot ofways to measure size (e.g., area - with or withoutholes, perimeter - total or exterior only, maximumand minimum caliper dimension, equivalent circulardiameter, radius of the smallest circumscribed orlargest inscribed circle, area and perimeter of the con-vex hull or taut string boundary, to list only the mostcommon), and most of these are easily understood byhumans because they use familiar words for familiarconcepts. But these size parameters can be combinedin an almost limitless number of ways to produce for-mally dimensionless shape parameters. Two of themore widely used are

4 ππ Area / Perimeter2

4 Area / ππ Length2

The first of these two shape parameters uses area andperimeter and is generally sensitive to the departurefrom roundness exhibited by the feature on the right,while the second one uses area and maximum caliperdimension, or length, and is generally insensitive tochanges in the smoothness of the perimeter but doesvary with elongation.

The names assigned to these are completely arbitraryand have no familiar intrinsic meaning. Furthermore,

Figure 59. The “top hat” illusion: In this exaggerated draw-ing of a top hat, the height appears to be much greater thanthe width, but in fact they are exactly the same.

Figure 60. The two lower shapes have the same area as thecircle but are not circular.

Vol. 39/4 Proceedings RMS December 2004 5

the various writers of software for image measure-ment use the names differently. Some call the first one(or its inverse) roundness; some call it formfactor anduse roundness for the second one, or use the wordcircularity. Some use other formulas and othernames. Without an accepted human meaning forthese concepts no consistency should be expected,and indeed none is found.

An important additional problem with these dimen-sionless ratios is that they are not at all unique. It ispossible to construct an infinite number of objectswhose shapes appear very different to a human butwhich have identical values of formfactor, as shown inFigure 61.

Still, these shape factors can be very useful for com-puter recognition purposes. Combined with oneother type of shape descriptor (the number of holes,a fundamental topological property that will be dis-cussed below), they often allow expert systems to beestablished to perform machine vision chores. As atrivial example, consider the task of recognizing theletters A through E as shown in Figure 62. Althoughthe letters use different fonts (serif and sans-serif), areprinted in different sizes and arbitrary orientations,humans have no difficulty in identifying them.Machines can do this too, but in a different way.Figure 62 shows a flow chart for an expert system thatuses several shape descriptors to accomplish it.

This kind of procedure is very brittle. Adding morecharacters or extreme fonts that are still easily recog-nized by a human breaks the procedure and anentirely new one may need to be devised. It is inter-esting that our familiarity with the Roman alphabetallows us to identify the characters very readily even ifthey are presented in an unfamiliar way (e.g., rotat-ed). When presented with less familiar characters(unless you happen to read Hebrew), a human takesmuch more time to make the comparisons and deter-mine how many different identical characters are pre-sent, and which is which, as shown in Figure 63. Thisindicates that for unfamiliar shapes the comparison isperformed in the usual way by mentally dragging androtating the characters so they can be compareddirectly, while in the case of the familiar shapes thelabels are applied immediately and it is only the labelsthat are matched, and not the actual shapes that mustbe compared.

One consequence of this behavior for scientific obser-vations is that the researcher who has (by dint of longhours of looking, or based on other knowledge)become sufficiently familiar with a class of objects torecognize them, even if he or she does not formallyhave a list of identifying characteristics, and even if

Figure 61. Twelve visually different shape with identical val-ues of the shape parameter

4 π Area / Perimeter2

Figure 62. Some recognizable shapes and a flow chart thatcan identify them.

6 Vol. 39/4 Proceedings RMS December 2004

boundary, and then perform a Fourier transform onit. The first few dozen coefficients in the Fourierexpansion of the plot are then used in statistical clas-sification procedures that have demonstrated greatpower in a few areas of application such as sedimen-tology and palynology. The method often accom-plishes robust identification in cases where humansare confused by the apparent wide variations in per-ceived shape, by finding some underlying fundamen-tal characteristics that can be statistically extractedfrom the clutter. But people clearly do not “see” thesame characteristics that allow this technique to iden-tify the sediment deposited by two glacial rivers basedon the 5th and 8th Fourier coefficients. It is the lack ofcorrespondence between what these admittedly pow-erful statistical methods can accomplish and the abilityof a human observer to extract similar informationfrom the images that has limited its application.

Topology and Fractal Dimension

It seems that instead of these numerical properties ofshape, people rely primarily on two specific kinds ofinformation for shape recognition. Fractal dimensionwill be discussed below. The other principal kind ofinformation is topological and is best illustrated byusing a computer processing operation to reduce ashape to its skeleton. The skeleton, mentioned previ-ously, is the midline of the feature, produced by aniterative removal of pixels from the boundary untilthe only ones left cannot be removed without break-ing the feature into pieces. As shown in Figure 64, theend points and branch points of this skeleton corre-spond to the basic topological features of the originalshape, and they can be very easily located because endpoints are pixels that have only one neighbor andbranch points (or nodes) have more than two. Eulerfigured out long ago that these topological featuresobey the relationshipNumber of Holes (or Loops) = Number of Branches

- Number of Ends - Number of Nodes + 1

Figure 63. Flow chart for identification of some Hebrew let-ters. Symmetry is defined here as one minus the ratio of thedistance from the feature centroid to the geometric center(center of a circumscribed circle) divided by the radius ofthat circle; it is 1.0 for a perfectly symmetrical feature.

that recognition is flawed, will immediately applylabels to the features in an image, whereas someonewithout that level of familiarity may struggle to makethe same identification or comparison. If the labelingcriteria used are imperfect, false positive or negativeidentifications will occur for the experienced viewerbut not for the novice, who is forced to make a moredetailed comparison.

These dimensionless ratios are useful shape factorsfor computer identification because there are so manyof them available and they require so little computa-tional effort, but they do not correspond to whathumans call shape, and their use does not mimic theway that people perform feature recognition.

There are other computer-based ways to describeshape. One of the most computationally intensive is to“unroll” the feature periphery as a plot of radius ver-sus angle, or of slope versus distance along the

Figure 64. An arbitrary shape (grey) with its superimposedskeleton (black). The 4 end points (red), 6 branch points(green) and 2 loops (blue) are marked.

Vol. 39/4 Proceedings RMS December 2004 7

The efficiency of using the skeleton for measurementof these topological features is illustrated in Figure 65.Recognition of the basic shape of the different stars(according to whether they have 3, 4, 5 or 6 points) isimmediate for a human. By counting the number ofend points the computer can label each one with thatsame topological property.

Instant recognition of the number of end pointsbecomes more difficult for humans when the numberis large or when the shapes are more complex andhave other variable (such as the length or thickness ofthe arms) as well. Figure 66 shows an example of thelatter case in which the arms must be counted, one byone, for some of the features. People are not good atcounting, and often make errors.

When the number of end points becomes large,gestalt counting no longer works (for various individ-uals this usually happens somewhere in the range of6 to 12). People don’t count things very well. Visuallycounting the number of teeth on the gear in Figure67 is slow and error-prone, but the computer methodusing the skeleton end points is still fast and robust.

Human vision apparently uses topological informa-tion and something like the skeleton for shape char-acterization. The key topological features - endpoints, branch points, corners and loops - are extract-

Figure 65. Star shapes with superimposed skeletons andlabels with the number of end points.

Figure 67. Image of a gear with superimposed skeletonwhose end points mark the 47 teeth.

Figure 68. Key topological features of the letter “A.”Figure 66. More complex star shapes with five, six andseven arms.

8 Vol. 39/4 Proceedings RMS December 2004

ed along with their pattern of connection. But, asshown in Figure 68, the distances between of thoseconnections are not so important. For the letter “A” asan example, the two branch points, two downwardpointing end points, and the sharp corner at the topcan considered to be connected by springs. Any dis-tortion of the figure that stretches the springs butkeeps the basic order of connections correct will berecognized as an “A” and in fact is sufficient to distin-guish it from any other letter in the alphabet.

To perform the same task described above of identify-ing the letters A through E, the key topological fea-tures of each are identified as shown in Figure 69.Note that this method is very robust to changes in thesize of the letters and can accommodate much greaterchanges in fonts or even handwriting than themethod using dimensionless ratios shown above.

It is the same ability to use just a few key features (cor-ners, ends and branch points) to characterize featureshape that accounts for one of the common illusions.Kanisza’s triangle (Figure 70) is constructed in themind by linking together the three well defined cor-ner points. The linking is always done with smooth,although not necessarily straight lines. As for the caseof the end points in a star, the human ability to rec-ognize the gestalt of polygons works best with a smallnumber of sides. Once the shape has been formed inthe mind, most people report that the interior of thetriangle (which is of course the illusion) appears to bebrighter than the background upon which it “rests.”

Once the basic topological form of the object shapehas been established, the second property of shapethat people seem to instinctively recognize is a mea-sure of the smoothness or irregularity of the bound-

ary. Because so much of the natural world has ageometry that is fractal rather than Euclidean, thistakes the form of an ability to detect differences inboundary fractal dimension.

So what is a boundary fractal dimension? For aboutthe last hundred years mathematicians have beenintrigued by curves that have the unusual, even dis-turbing property of an undefined length. Examiningthe line at ever finer detail simply reveals more irreg-ularities and more length. In terms of examining sci-entific objects, whether it is the margin of a leaf or thefracture of a metal, increasing the image magnifica-tion reveals more detail in a hierarchy that extendsover many decades of scale. Not everything has this“self-similar” property. Fluids with surface tension,cells with elastic membranes, and most man-madesurfaces have a well defined Euclidean boundarywhose length can be defined and does not changewith magnification. Such boundaries for objects ontwo-dimensional images have the topological dimen-sion of a line, exactly one.

But irregular boundaries that are fractal have alength that increase regularly with magnification. Theclassic illustration is Richardson’s “How Long is theCoast of England?” example. Measuring that lengthon a map using dividers set at 10 km will produce oneresult. Walking along the coast with a 100 meter chainwould follow more of the irregularities and produce alarger measurement. Crawling along the shore with ameter stick would result in a further increase, and soon. Plotting the length values versus the length of themeasuring tool, on log-log axes, gives a straight linewhose slope M equals 2 - D where D is the fractaldimension of the boundary, a number between 1.0and 1.999. The higher the number the more irregu-lar, or rough the boundary is perceived to be. In theHawaiian islands, for example, the older (and moreweathered) islands have a dimension greater than theyounger (and smoother) ones.

Many natural structures have been shown to have thisfractal nature, with typical values in the range up toabout 1.35, and apparently people, while they cer-tainly do not measure the dimension, have a veryrobust ability to compare features and put them intothe correct order in terms of boundary “roughness.”As usual, human vision compares things best when

Figure 69. Key topological features of the letters A throughE.

Figure 70. Kanisza’s triangle is an illusory region visuallyformed by linking corner markers with straight lines or gen-tle curves.

Vol. 39/4 Proceedings RMS December 2004 9

Figure 71. Several shapes with their fractal dimensions as measured by the computer.

they can be placed side by side in the same field ofview, performs the comparison between two featuresat a time, and only painstakingly constructs a rankingfor a field of objects. It also suffers when comparing acurrently viewed object to one from memory, becausethe fine details needed to represent boundary irregu-larity are typically either not remembered very well orin some cases recalled with too much emphasis.Computer measurement of fractal dimension can alsobe performed; the most accurate method constructs aplot of the number of pixels as a function of their dis-tance from the boundary, whose slope gives thedimension. As shown by the examples in Figure 71,the feature’s fractal dimension relates only to thisboundary irregularity and not to other aspects ofshape such as the topology or to ratios such as length/ breadth.

Since the visual recognition of shape is primarilybased on simple topological form and boundaryirregularity, it might be useful to employ those para-meters to communicate a description of shape fromone person who is familiar with a class of objects toanother, who is not. The mathematical descriptions of

the parameters are perhaps unfamiliar but not reallydifficult, and the ideas at least are comfortable.Furthermore, computers can be easily programmedto extract the same information. In fact they are: it isthe topological shape of printed characters that isused in most optical character recognition (OCR) soft-ware that converts pages of printed text back into aneditable computer file.

So far, however, that approach has been taken onlyrarely. The most widely accepted method for commu-nicating the information about object shape charac-teristics that are used for recognition remains show-ing a picture of the object to another person. For fea-tures that are exactly alike, or so nearly alike that theonly variations are essentially invisible at the scale ofnormal viewing, that works fine. Unfortunately inmost of the sciences the objects of interest, whetherthey are defects in a material, cancerous cells on aslide, or a new species of bug, are not identical. Thenatural variation is significant, although the clues torecognition (if they are correctly chosen) remainpresent (although not necessarily visible in everyimage).

10 Vol. 39/4 Proceedings RMS December 2004

In presenting the “representative image” the scientistattempts to communicate these clues to colleagues.the picture almost always needs an extensive supple-ment in words (think of Arlo’s song again - the“Twenty-seven 8x10 color glossy pictures with circlesand arrows and a paragraph on the back of eachone”). But if they are not as familiar with the objects(which is of course the reason for the communica-tion), will they be able to pick out the same featuresare clues? And does the selected image adequatelyrepresent the range of variation in the naturalobjects? These dangers are always present in the useof “typical” images even if the image really does givea fair representation, and in most cases it shouldprobably be admitted that the picture was selected notafter analysis showed it to be representative in any sta-tistical sense but because the picture satisfied someother, unspoken aesthetic criterion (or worse, was theonly good quality image that could be obtained).Anecdotal evidence, which is all that a single picturecan ever provide, is risky at best and misleading atworst, and should be avoided if it is possible to obtainand present quantitative data.

Context

Recognition of features is often influenced by the con-text in which they appear. Sometimes this context issupplied by the image itself, but more often it arisesfrom prior knowledge or independent information.In Figure 72, there are several very different repre-sentations of the number five, including ones in lan-guages that we may not know. But once the conceptof “fiveness” is accepted, the various representationsall become understood. Quite a bit of knowledge andexperience that has nothing to do with images isinvolved in this process, and it happens at higher lev-els of conscious thought than basic shape recognition.Knowing that pentagons have five sides may help ustranslate the Greek, or recalling a Cinco de Mayoparty may help with the Spanish, and so on.

An interesting insight into this recognition processcomes from the study of patients who suffer fromsynesthesia, a phenomenon in which some kind ofcross-wiring in the brain confuses the output fromone sense with another. People with synesthesia mayreport that particular notes played on the piano trig-ger the sensation of specific tastes, for example. Inone of the most common forms of synesthesia, look-ing at a number produces the sensation of a specificcolor. For instance, in a printed array of black num-bers, the fives may all appear red while the threes areblue (Figure 73), and the ability to detect and countthe features is extremely rapid compared to the needto identify and count the features consciously.

This cross-activation of different sensory pathways inthe brain occurs well before the information rises toconscious levels. The alternative representations of“five-ness” shown above do not trigger these colors.Even modest distortions of the printed number, suchas unusual or outline fonts, may be enough to preventit. The study of these types of brain mis-functions isimportant for an understanding of the processing ofsensory information, extraction of abstract concepts,and formation of connections between seemingly sep-arate areas of knowledge that can occur at subcon-scious and conscious levels. In this instance, it showsthat basic shape recognition happens long before thelabels, with their symantic content, are applied to fea-tures.

It is even possible to have multiple contexts within thesame image. This happens particularly with readingwords. We tolerate misspellings and sloppy handwrit-ing because there is usually enough redundancy andcontext to allow the message to be comprehendedeven when the image itself is wrong or ambiguous, asshown in the example of Figure 74.

Figure 72. Various representations of “five.”

Figure 73. Synesthesia may associate specific colors withnumbers, so that they “pop out” of an image.

Figure 74. The final words in each line are identical inshape, but can be read correctly because of the contextestablished by the other words.

Vol. 39/4 Proceedings RMS December 2004 11

The importance of context is critical for the correctinterpretation of data in scientific images. Our expec-tations based on prior experience and study, knowl-edge of how the sample was prepared and how theimage was acquired, and even our hopes and fearsabout how an experiment may turn out, can signifi-cantly affect visual interpretation, and the recognitionof features (or failure to recognize them).

Obviously, this works in two different ways. It isimportant to know enough about the sample andimage to correctly interpret it, while avoiding the pit-falls that expectation can cause. Some people are betterat this than others, and anyone can make the occasion-al mistake, but fortunately the open nature of scientificpublication provides a mechanism for correction.

One very frequently encountered problem of contextfor images arises in microscopy. Sections are typicallycut through tissue with a microtome for examinationin transmission, or surfaces of opaque materials areprepared by polishing for examination by reflectedlight (or the equivalent use of the transmission orscanning electron microscope). In all of these cases,the images are inherently two-dimensional, but thestructure that they represent and sample is threedimensional. It is very difficult for most people toprovide a proper context for understanding theseimages in terms of the three-dimensional structure.

Nothing in our evolutionary experience has preparedour vision system to handle section images. We expectto see the outsides of objects. Shapes are implicitlyunderstood as silhouettes. In fact, it is even difficult tomentally construct the appearance of sectionsthrough objects. An experiment that I have used withmany classes of students is to place a bagel (a torus) infront of them and ask them to sketch about a dozenrandom sections through it (Figure 75). Everyonegets the standard “bagel cut” right, and after only amoment’s thought most draw the two circular sectionsproduced by a vertical cut. But then things get hard.There is a strong tendency to selectively draw sectionsthat pass through the geometric center and are hencesymmetrical. Few of the complex arcs and lopsidedovoids are represented. And many students draw sec-tions that are not actually possible.

If it is that hard to imagine what the sections througha very simple shape will look like, how difficult is it forthe complex shapes that occur in natural objects, suchas organelles in cells or dendrites in metals? And thisis the “forward” problem, going from a known three-dimensional shape to the two-dimensional sectionsthat would result. It is much harder to go in reverse,seeing the two-dimensional shapes and having toimagine what the three-dimensional object must havebeen that was responsible for them.

Figure 75. A few of the possible cuts through a bagel.

12 Vol. 39/4 Proceedings RMS December 2004

In fact this is not always a unique solution. Somethingas simple as ellipses illustrates the situation. Ellipticalsections will result from sections through either oblateor prolate ellipsoids of revolution (the prolate shape islike an American football, the oblate shape like a dis-cus). If all of the ellipsoids are the same size andshape, then by examining a large number of randomsections it is possible to determine whether the gener-ating solid is oblate or prolate. If the length of themost elongated ellipse is similar in dimension to thediameter of the most equiaxed section, then the ellip-soid is oblate. If the width of the most elongatedellipse is close to the diameter of the equiaxed sec-tions, the ellipsoid is prolate. This is intentionally notillustrated to force the reader to try to mentally con-struct these very simple shapes.

If the sizes of the ellipsoids vary, the problem becomesmuch more difficult. And if the shapes of the ellip-soids can also vary, it becomes insoluble. But regard-less of the ability to intellectually solve a tricky three-dimensional puzzle, this is never a task that thehuman brain solves automatically based on experi-ence with section images. All of our routine experi-ence is with projected views of the outsides of objects,and it leads to misunderstandings about 3D structureexamined by sectioning. As a simple example, gener-ations of textbooks described mitochondria as “foot-ball-shaped” structures within the cell, because themost easily recognized sections through mitochondriaappear as elliptical in shape. In fact, the 3D structureis cylindrical, with bends and branches that producemany types of sections, the most irregular of whichare typically not recognized at all in sections.

It is possible to measure quite a few geomet-ric properties of three-dimensional structurefrom two-dimensional images, including vol-umes, surface areas, curvature and length,the mean size of arbitrary and variablethree-dimensional objects, and so on. Theentire field of Stereology, several profession-al journals, and an international society, allexist to deal with this need. Methods of greatpower and in some cases surprising simplic-ity have been developed to accomplish thenecessary measurements and calculations,and they are gradually becoming morewidely used as microscopists realize the needfor them. But they can never compensate forthe fact that people just don’t automaticallyunderstand the relationships between three-dimensional structure and two-dimensionalimages.

This seems to be true even after extensivetraining and developing familiarity withparticular three dimensional structures.Medical doctors and technicians rely upon

section images from instruments such as magneticresonance imaging (MRI) and computed X-raytomography (CAT scans) to study the human body.But it appears from tests and interviews that few ofthese people have a three-dimensional mental pictureof the structure. Instead, they learn to recognize thenormal appearance of sections that are almost alwaystaken in a few standard orientations, and to spot devi-ations from the norm, particularly ones associatedwith common diseases or other problems.

Arrangements must be made

One thing that people are extremely good at, howev-er, is finding order in an arrangement of objects.Sometimes this quest for simplification finds true andmeaningful relationships between objects, and some-times it does not. Historically, the construction of con-stellations by playing connect-the-dots among thebright stars in the sky seems to have been carried outby many different cultures (of course, with differentresults). Figure 76 shows the classical Greek version.Assembling the data needed to construct Stonehengeas a predictor of solstices and eclipses must have takenmultiple lifetimes. The Copernican revolution thatallowed ellipses to simplify the increasingly complexPtolemaic circles-and-epicircles model of planetarymotion was a quest for this same type of simplification.

Most scientists follow Einstein’s dictum that it isimportant to find the simplest solution that works, butnot one that is too simple. The idea is much olderthan that. William of Occam’s “principle of parsimo-ny” is that “one should not increase, beyond what is

Figure 76. The familiar stellar constellations.

Vol. 39/4 Proceedings RMS December 2004 191

necessary, the number of entities required to explainanything.” Instinctively we all seek the simple answer,sometimes in situations where there is not one to befound. And of course, this applies to the examinationof images, also.

Finding visual alignments of points in images, or inplots of data, people prefer straight lines or smooth,gradual curves. Linear regression is probably themost widely used (and abused) method of data inter-pretation, for imposing order on a collection ofpoints. Filling in gaps in lines or boundaries is often auseful procedure, but it can lead to mistakes as well.The illusory Kanisza triangles are an example of fill-ing in gaps and connecting points. The process ofconnecting points and lines is closely related to

grouping, which has been discussed before, forinstance in the process of forming a number from thecolored circles in the color blindness test images.

Human vision has a built-in directional bias thatprefers the vertical, followed by the horizontal, and astrong preference for symmetry. As shown in Figure77, that can bias our judgment. It also influences ourability to detect gradients or clustering.

Some processes produce a random distribution of fea-tures, like sprinkling salt onto a table. If every featureis completely independent of all the others, a randomdistribution results. Non-random distributions occurbecause features either attract or repel each other.Cacti growing in the desert are self-avoiding, becauseeach one tries to protect its supply of water and nutri-ents. Particles floating on a liquid surface may tend tocluster because of surface tension effects. In extremecases, visual observation of clustering or self-avoid-ance is possible. But people do not easily see throughapparently chaotic distributions to detect these effects(Figure 78). In fact, the presence of variation in anytype of feature appearance makes it very hard forvisual inspection to detect an underlying order.

That isn’t to say that people prefer the order. Testswith computer-generated images of apparently ran-dom paint droplets, showed that completely orderedimages were considered boring and not visually stim-ulating, while completely random ones were consid-ered to be equally uninteresting. When the correla-

tion between size, color and positionobeyed a “pink noise” or fractal rela-tionship, the pictures were mostvisually interesting to viewers.Interestingly, the same relationshipswere found in mathematical analysisof several of Jackson Pollock’s paint-ings (Figure 79).

Apparently a distribution in whichthere is just enough hint of orderthat we must work to find it visuallyis the most appealing. The implica-tion for noticing (or failing to notice)arrangements of features in scientif-ic images is clear; computer mea-surement can detect these proper-ties, but only if we notice somethingthat leads us to perform the neces-sary measurements.

Beyond the subject of arrangementsthat may be present throughout animage, gradients are often impor-tant, but not always easy to detect.The problem is that there can be somany different kinds of gradient.

Figure 77. These two shapes are not perceived to be identi-cal. The “kite shape” on the left has a dominant vertical axisof symmetry. The irregular four-sided polygon on the lefthas a horizontal base.

Figure 78. Examples of features distributions that are statistically a) clustered; b)self-avoiding; c) random

a b

c

near (or, conversely, away from) the boundary of anirregular region. Figure 81 illustrates the case oforganelles in a cell, but the same thing occurs forplants near an aquifer, particles in metal grains, andpeople in a room at a party (depending on where thebar is located). Detecting the gradient in such cases isvisually difficult and it is usually necessary to resort to

m e a s u r e m e n t .Computer methodscan determine the dis-tance of each featurefrom a point orboundary, and thisdistance can then beused, along with theappropriate measuresof size, shape, etc., tostatistically assesswhether a significantgradient is present.

The innate ability thatpeople have for find-ing order in images isrisky. Sometimes weimagine a regularityor order that is notthere (e.g., constella-tions), and sometimesthe presence of com-plexity or superficialdisorder hides the realunderlying structure.Some orientations aremore readily detectedthan others, and com-plicated gradients arelikely to escape detec-tion unless we know

beforehand what to

192 Vol. 39/4 Proceedings RMS December 2004

Features may vary in size, shape, orientation, color,density, number, or any combination of these factors,as a function of position. Figure 80 shows only a fewof the possibilities. The gradient may be linear,although not necessarily vertical or horizontal, but itcan also be radial or follow a more complex path. Inmany cases, features have a tendency to cluster either

Figure 79. Jackson Pollock’s “Blue Poles #11.” His paintings have been analyzed to determine that they have a fractal struc-ture and a complexity that evolved during his career.

Figure 80. Examples of gradients of size, color, shape, and orientation.

Vol. 39/4 Proceedings RMS December 2004 15

look for. The result is that many real spatial arrange-ments may be missed, even in two dimensions (andbecause of the additional problems introduced byexamining two-dimensional sections through three-dimensional structures, the problem is much worsefor three-dimensional spatial arrangements).

So in conclusion...

Human vision is an extremely powerful tool, evolvedover millenia to extract from scenes those details thatare important to our survival as a species. The pro-cessing of visual information combines a hierarchy ofhighly parallel neural circuits to detect and correlatespecific types of detail within images. Many short cutsthat work “most of the time” are used to speed recog-nition. Studying the failure of these tricks, revealed invarious visual illusions, aids in understanding of theunderlying processes.

An awareness of the failures and biases is also impor-tant to the scientist who relies on visual examinationof images to acquire or interpret data. Visual inspec-tion is a comparative, not a quantitative process, andit is easily biased by the presence of other informationin the image. Computer image analysis methods areavailable that overcome most of these specific prob-lems, but they provide answers that are only as goodas the questions that are asked. In most cases, if thescientist does not visually perceive the features or

trends in the raw images, their subsequent measure-ment will not be undertaken.

For further reading

There are several classic books that provide a goodintroduction to human vision, without delving toodeeply into the fascinating but specialized literatureregarding messy anatomical details of the visual cor-tex. They include:

John P. Frisby (1980) Illusion, Brain and Mind, OxfordUniv. PressIrvin Rock (1984) Perception, W. H. Freeman CoDavid Marr (1982) Vision, W. H.Freeman Co.

Computer-based image analysis offers many valuableinsights into human vision, if only to show that thereare computational models that function differentlybut attempt with varying degrees of success to extractthe same types of information. There are a greatmany books in this field. A good sampling with differ-ent topical coverage and styles would include

John C. Russ (2002) The Image Processing Handbook, 4thedition, CRC PressKenneth R. Castleman (1996) Digital Image Processing,Prentice HallGaurav Sharma, ed. (2003) Digital Color ImagingHandbook, CRC PressAlan Watt & Fabio Policarpo (1998) The Computer Image,Addison WesleyRafael C. Gonzalez & Richard E. Woods (1993) DigitalImage Processing, Addison Wesley

Probably the most accessible texts covering variousmodels for object recognition, and the software forimplementing them, are:

Keinosuke Fukunaga (1990) Introduction to StatisticalPattern Recognition, 2nd edition, Academic PressSing-Tze Bow (1992) Pattern Recognition and ImageProcessing, Marcel Dekker

To learn more about the science of stereology,the rela-tionship between three-dimensional structure andtwo-dimensional images, see:

John C. Russ & Robert T. Dehoff (2002) PracticalStereology, 2nd edition, Plenum Press

Of course, there is also an extensive literature inmany peer-reviewed journals, and in the modern erano one should neglect to perform a google search ofthe internet, which will locate several sets of coursenotes on this topic as well as publication reprints andmany sites of varying quality.

Figure 81. Distribution of organelles in a cell, with plot ofnumber vs. distance.