color- and texture-based image segmentation using em

8/2/2019 Color- And Texture-Based Image Segmentation Using EM

1/8

Color- and Texture-Based Image Segmentation Using EMand Its Application to Content-Based Image RetrievalSerge Belongie, Chad Carson, Hayit Greenspan, and Jitendra MalikComputer Science DivisionUniversity of California at BerkeleyBerkeley, CA 94720{sjb,carson,hayit,malik}63cs.berkeley.edu

AbstractRetrieving images rom large and varied collections usingimage content as a key is a challenging and importantprob-

lem. In this paper w e present a new image representationwhich provides a transforma tion r om the raw pixel data toa small set o image regions which are coherent in color andtexture space. This so-called blobworldrepresentation isbased on segmentation using the Expectation-Maxim izationalgorithm on combined color and texture eatures. The tex-ture eatures we use fo r the segmentation arise from a newapproach to texture description and scale selection.We describe a system that uses the blobworld representa-tion to retrieve images. An important and unique aspect othe system is that, in the context o similarity-based query-ing, the user is allowed to view the intem al representationof the submitted image and the query results. Similar sys-tems do not offer the user th is view into the workings of thesystem; consequently, the outcom e of many queries on thesesystems can be quite inexplicable, despite the availabilityofknobs o r adjusting the similarity metric.

1 IntroductionVery large collectio ns of images.are grow ing ever morecommon. From stock photo collections to proprietarydatabase s to the W eb, these collections are diverse and oftenpoorly index ed; unfortun ately, mage retrieval systems havenot kept pace with the collections they are searching. The

shortcomings of these systems are due both to the imagerepresentation s they use and to their metho ds of accessingthose representation s to find images:

0 While u sers generally want to f ind images containingparticular objects (things) [4, 61, most existing im-age retrieval system s represent images based on ly ontheir low-level features (stuff), with little regard forthe spatial organization of those features.

0

0

In

Systems based on user querying are often unintuitiveand offer little help in understand ing why certain im-ages were returned and how to refine the query. Oftenthe user knows only that he has submitted a query for,say, a bear and retrieve d very few pictures of bears inreturn.For general image collections, there are currently nosystems that automatically classify images or recog-nize the objects they contain .this paper we present a new image representation,-blobw orld, and a retrieval system based on this repre-

sentation. While blobworld does not exist completely inthe thing domain, it recognizes the nature of images ascombinations of objects, and both q uerying and learning inblobworld are more meaningful than they are with simplestuff representations.We use the Exp ectation-Maximization (EM) algorithm toperform automatic segmentation based on image features.EM iteratively models the join t distr ibution of color andtexture with a mixture of Gaussians; the resulting pixel-cluster memberships provide a segmentation of the image.After the image is segmented1 into regio ns, a descriptionof each regions color, texture, and spa tial characteristics isproduc ed. In a querying task, the user can access the regionsdirectly, in order to see the segme ntation of the query imag eand specify which aspects of the image are important to thequery. Whe n query results are returned, the user sees theblobw orld representatio n of the returned imag es; this assistsgreatly in refining the query.

We begin this pap er by brieflly discus sing the cur rent stateof image retrieval. In Section 2 we describe the blobworldrepresentation, from features through segm entation o regiondescription. In Section 3 we present a query system basedon blobworld, aswell as results from queries in a collectionof highly varied natural images.

675
http://cs.berkeley.edu/http://cs.berkeley.edu/


2/8

Current image database systems include IBMs Query byImage Content (QBIC) [18], Photobook [20], Virage [IO],Candid [141, and Chabot [19]. These systems primarily uselow-level image properties; several of them include som e de-gree of automatic segmentation. None of the systems codesspatial organization in a way that supports object queries.Classical object recognition techniques usually rely onclean segmentation of the object from the rest of the im ageor are designed for\fixed geom etric objects such as machineparts. Neither constraint holds in our case: the shape, size,and color of objects like cheetahs and polar bears are quitevariable, and segmentation is imperfect. Clearly, classicalobject recognition does not apply. Mo re recent techniquesI211 can identify specific objects drawn from a finite (onthe order of 100) collection, but no present technique iseffective at the general image analysis task, which requiresboth image segmentation and image classification.

Promising work by Lipson et al . [I61 retrieves imagesbased on spatial and photometric relationships within andacross image regions. Little or no segmentation is done; theregions are derived from low -resolution images.

Earlier work has used th e EM algorithm and/or the Min-imum D escription Length (MDL) principle to perform seg-mentation based on motion [1,251or scaled intensities [26],but EM has not previously been used on joint color andtexture. Related approache s such as deterministic anneal-in g [113 and classical clustering [121 have been applied totexture segmentation without color.

The blobw orld representation is related to the notion ofphotographic or artistic scene composition. In the sensediscussed in 2231, the blobworld descriptors constitute anexample of a summary representation because they are con-cise and relatively easy to process in a querying framework.Blobworld is distinct from color-layout matching as inQBIC in that it is designed to f ind objects or parts of ob-jects. E ach image may b e visualized by an ensemble of 2-Dellipses, or blobs, each of which possesses a number ofattributes. The nu mber of blobs in an image is typically lessthan ten. Each blob represents a region of the image whichis roughly homogeneous with respect to color or texture.A blob is described by its dominant colors, mean texturedescriptors, and spatial centroid and scatter matrix. (SeeFigs. 3- 4 for a visualization of blobworld.)2.1

Our goal is to assign the pixels in the orig inal image to arelatively small number of groups, where each group repre-sents a set of pixels that are coheren t in their color and localtexture properties; the motivation is to reduce the amount

Extracting color and texture features

2 The blobw orld image representation

676

of raw data presented by the image w hile preserving the in-formation needed for the image understanding task. Giventhe unconstrained nature of the images in our database, it isimportant that the tools we employ to m eet this goal be asgeneral as possible without sacrif icing an und ue amo unt ofdescriptive power.2.1.1 Color

Color is a very importaat cue in extracting informa-tion from images. Color histograms are commonly usedin content-based retrieval systems [18, 19, 241 and haveproven to be very useful; however, the global characteriza-tion is poor at, for example, distinguishing between a fieldof orange flowers and a tiger, because it lacks informationabout how the color is distributed spatially. I t is importantto group color in localized regions and to fuse color withtextural properties.

We treat the hue-saturation-value (HSV) color space asa cone: for a given point ( h , , w), h an d sw are the angularand radial coordinates of the point on a disk of radius v atheight U ; all coordinates range from 0 to 1. Points withsmall w are black, regardless of their h an d s values. Thecone representation maps all such points to the apex of thecone, so they are close to one another. The C artesian coor-dinates of poin tsin the cone, (sw o s ( 2 n h ) , sw sin(2!.n;h),),can now be used to find color differences. This encodin gallows us to operationalize the fact that hue differences aremean ingless for very sm all saturations (those near the conesaxis). How ever, it ignores the fact that for large values andsaturations, hue differences are perceptually more relevantthan saturatio n and value differences.2.1.2 Texture

Texture is a well-researched property of image regions,and many texture descriptors have been proposed, includingmulti-orientation filter banks [171 and the second-momentmatrix [5, 81. We will not elaborate here on the classicalapproaches to texture segmentation and classification, bothof which are challengin g and well-studied tasks. R ather, weintroduce a new perspective related to textu re descriptors andtexture grouping motivated by the content-based retrievaltask.

While color is a point property, texture is a local-neighborhood property. It does not make sense to talk aboutthe texture of zebra stripes at a particular pixel without spec-ifying a neighborhood around that pixel. In order for atexture descriptor to be useful, it must provide an adequatedescription of the underlying texture parameters and it mustbe computed in a neighborhood which is appropriate to thelocal structure being described.

The first requireme nt could be met to an arbitrary degreeof satisfaction by using m ulti-orientation filter banks su ch assteerable filters; we ch ose a simpler method th at is sufficient


3/8

(a) flow; (b) flow; (c) 2-D texture; (d) edge (e) uniforma = .5 a = 2.5 a = .5 a=O 0 = 0Figure 1. Five sample patches from a zebra im-age. (a) and (b) have stripes (1-D flow) of dif-ferent scales and orien tations, (c) i s a region o f 2-Dtexture, (d ) contains an edge, and (e) i s a uniformregion.

for our purposes. The second requirement, which may bethought of as the problem of scale selection, does not enjoythe same level of attention in the literature. This is unfortu-nate, since texture descriptors computed at the w rong scaleonly confuse the issue.In this work, we introduce a novel method of scale selec-tion which works in tandem with a fairly sim ple but informa-tive set of texture descriptors. Th e scale selection method isbased on e dg eb ar polarity stabilization, and the texture de-scriptors arise from the windowed second moment matrix.Both are derived from the gradient of the image intensity,which we d enote by V I . We compute V I using the firstdifference approx imation along each dimension . This oper-ation is often accompanied by smo othing, but we have foundthis preprocessing operation unnecessary fo r the images inour collection.To make th e notion of scale concrete, we define the scaleto be the width of the Gaussian window within which thegradient vectors of the image are pooled. The second mo-ment matrix for the vectors within this window, computedabout each pixel in the image, can be app roximated using

where G, (z ,y) is a separable binomial approximation to aGaussian smoothin g kernel with variance u2.

At each pixel location, Mu (2, ) is a 2 x 2 symmetricpositive semidefinite matrix; thu s it provides us with threepieces of information about each pixel. Rather than workwith the raw entries in M u , t is more common to deal withits eigenstructure [2, 51. Consider a fixed scale and pixellocation, let XI an d A2 (XI 2 Xz) denote the eigenvalues of

Mu at that location, and let q5 denote the argument of theprincipal eigenvector of Mf f When XI is large compared toX2, the local neighborhood possesses a domin ant orientation,as spe cified by 4. When the eigenvalues are comparable,there is no preferred orientation, and when both eigenval-ues are negligible, the local neighb orhood is approximatelyconstant.Scale selection

We may think of (T as controlling the size of the integra-tion window around each pixel within which the outer prod-uct of the gradient vectws is averaged. (T has been calledth e integration scale or arti$cial scale by various authors[5 ,81 to distinguish it from the natural scale used in linearsmoo thing of raw imag e intensities. N ote that CT = u(x , );the scale varies across the image.

In order to select the scale at which M f f s computed,i.e. to determine the function ~ ( x ,) , we make us e of alocal image property known as polarity. The polarity isa measure of the extent to which the gradient vectors in acertain neighborhood all point in the same direction. (In thecomp utation of second moments, this information is lost inthe outer product operatio n; i.e., gradient vector directionsdiffering by 180 are indistinguishable.) The polarity at agiven pixel is computed with respect to the dominant ori-entation q5 in the neighb orhood of that pixel. For ease ofnotation, let us consider a fixed scale and pixel lo cation. Wedefine polarity as

The definitions of E+ an d E- are

and

where [ q ] + an d [ q ] - are the rectified positive and negativeparts of their argument, f i is a unit vector perpendicular to 4,an d Q represents the neighborhood under conside ration. Wecan think of E+ an d E- as measures of how many gradie ntvectors in Q are on the positive side and negative sideof the domina nt orientation, respectively. Note that p rangesfrom 0 to 1. A similar measure is used in [151to d istinguisha flow pattern from an edge.

Strictly speaking, eqn. (1) is a sliding inner product, not a convolution,2Polarity is related to the yuudruturephuse as discussed in [7, 91.since ~ ( z ,) is spatially variant.

677


4/8

The polarity p, varies as the scale (r changes; its behaviorin typical image regions can be summarized as follows:Edge: The presence of an edge is signaled by po holdingvalues close to 1 for all (r.Texture: In regions with 2-D texture or 1-D flow,p a decays

with 0 due to the presence of multiple orientations.Uniform: In a constant-intensity neighborho od, p , takes

on arbitrary values since the gradient vectors havenegligible magnitudes and therefore arbitrary angles.

The process of selecting a scale is based on the derivativeof the polarity with respect to scale. First, we comp ute thepolarity at every pixel in the image for ok = k / 2 , k =0, 1 , . . . ,7 , thus producing a stack of polarity imagesacross scale. Th en, for each I C , the polarity image computedat scale (r k is convolved with a Gaussian with standard de-viation 2Crk to yield a smoothed polarity image Foreach pixel, we select the scale as the first value of ok fo rwhich the difference between successive values of polaritywok- FOk-,) s less than 2% . In this manner, w e are per-forming a soft version of local spatial frequency estimation,since the smoo thed polarity tends to stabilize once the scalewindow encompasses on e approximate period. Since westop at (r k = 3.5, the largest period we can d etect is approx-imately 10 pixels. No te that when the period is undefined,as is the case in uniform regions, the selected scale is notmeaningful an d is set to zero. We declare a pixel to beuniform if its mean contrast across scale is less than 0.1.

Another m etho d of scale selection that has been proposed[8] is based on localizing extrema across scale of an invariantof M,, such as the trace or determinant. In this algorithm,which is applied to the problem of estimating the slant andtilt of surfaces with tangential texture, it is necessary toperform natural smoothing at a scale tied to the artificialscale. We found that this extra smoo thing compromised thespatial localization ability of our scale selection metho d.Texture features

Once a scale CT* is selected for each pixel, that pixel isassigned thre e texture descriptors. The first is the polarity,p,.. The other two, which are taken from M O + ,re theanisotropy, defined as a = 1 - X2/X1, and the normalizedtexture contrast, defined as c = 2 d m . These arerelated to derived quan tities reported in [8].2.1.3The color/texture descriptor for a given pixel consists ofsix values: three for color and three for texture. The threecolor compone nts are the color-cone coordinates found after

Combiningcolor and texture features

31f we use a cente redlrst difference kernel in the gradient computation,the factor of 2 makes c range from 0 to 1.

spatial averaging using a Gaussian at the selected scale.The three texture components are a c ,pc, an d c, computedat the selected scale; the anisotropy and polarity are eachmodulated by the contrast in analogy to the constructionof the color-cone coordinates. (Recall that anisotropy andpolarity are meaningless in regions of low contrast.) Ineffect, a given textured patch in an imag e first has its textureproperties extracted and is then replaced by a smooth patchof averaged color. In this manner, the colo r and textureproperties in a given region are decoup led; for example, azebra is a gray horse plus stripes.

Note that in this formulation of the co lodtexture descrip-tor, orientation and selected scale do not appear in the featu revector; as a result, group ing can occur across variations inscale and orientation.2.2 Grouping with the EM Algorithm

Once an image has been processed using the above fea-ture extraction schem e, the result is a la-g eset of 6-D featurevectors, which we may regard as points in a 6-D featurespace. In order to divide these points into g roups, we makeuse of the Expectation-Maximization (EM) algorithm [3] todetermine the maxim um likelihood parameters of a mixtureof liGaussians inside the 6-D feature space.

The EM algorithm is used for f inding maximum likeli-hood parameter estimates when there is missing or incom-plete data. In our case, the missing data is the region towhich the points in the feature space belong. We estimatevalues to f ill in for the inco mplete data ( the E-Step), com-pute the maxiniurn likelih ood parameter estimates usin g thisdata ( the M-Step), and repeat until a suitab le stopping cri-terion is reached.

The first step in applying the EM algorithm is to initializea mean vector and covariance matrix to represent each ofth e K groups. We initialize the means to random valuesand the covariances to identity matrices. (In earlier workwe chose the initialization for EM carefully, but we havefound that the initialization has little effect on the qualityof the resulting segmentation.) Th e update scheme allowsfor full covariance matrices; variants include restricting thecovariance to be diagonal or a constant times the identitymatrix. Full covariance matrices suit our problem, sincemany p lausible feature clusters require extruded covarianceshapes, e.g. the shades of gray along th e color cone axis.Upon conv ergence, the Gaussian mix ture parameters canbe inspected to determine what color/texture properties arerepresented by each com ponent of the mixture. Som e ex-amples of groups that can form include the following :

e brigh t, bluis h, and textureless regions (e.g., sky)e anisotropic and non-polar regions (e.g., zebra hide)e green weak-isotropic texture (e.g., grass)

678


5/8

We have thus far not discussed how to choose IC, th enumber of mixture compon ents. Ideally we would like tochoose that value of Ii' that best suits the natural num ber ofgroups present in the image. On e readily available notion ofgoodne ss of fit is the log-likelihoo d. G iven this indicator, wecan apply the Minimum Description Length (MDL) princi-pl e [ 22 ] o select among values of IC. As a consequence ofthis principle, when models using two values of Ii' it thedata equally well, the simpler model will be chosen. Fo r ourexperiments, I( ranges from 2 to 5.Once a model is selected, the next step is to performspatial grouping of those pixels belonging to the samecolor/texture cluster. We first produce a I


6/8

Spatial descriptorsThe geometric descriptors of the blob are simply the

centroid c and scatter matrix S of the blob region; the cen-troid provides a notion of position, wh ile the scatter matrixprovides an elementary shape description. In the queryingprocess discussed in Section 3.1, centroid separations areexpressed using Euclidean distance. The determination ofthe distance between scatter matrices is based on the threequantities [det(S)]/ =m, - K ~ / K I , nd B . ( K Iand ~2 are the eigenvalues of S; B is the argument of theprincipal eigenvector of S. ) These three qu antities representapproximate area, eccentricity, an d orientation.

> >

3 Image retrieval by queryingAnyone wh o has used a search engine, text-based or o th-erwise, is familiar with the reality of unwanted matches.

Often in the case of text searches this results from the useof ambiguous keywords, such as bank or interest [27].Unfortunately, with image queries it is not always so clearwhy things go wrong. Unlike with text searches, in whichthe user can see the features (words) in a document, noneof the current content-based im age retrieval systems allowsthe user to see exactly what the system IS looking for inresponse to a similarity-based query. Simply allowing theuser to submit an arbitrary image (or sketch) and set someabstract knobs without knowing how they relate to the in-put image in particular implies a degree of complexity thatsearching algorithms do not have. As a result, a query fora bear can return just ab out any ob ject under the sun if the.query is not based on image regions, the segmentation rou-tine fails to find the bear in the submitted image, or thesubmitted image contains othei distinctive objects. Withoutrealizing that the input image was not properly processed,the user can only wonder what w ent wrong. In order to helpthe user formulate effective queries and u nderstand their re-sults, as well as to minimize disappointment due to overlyoptimistic expectations of the system, the system shoulddisplay its representation of the submitted image and thereturned im ages.3.1 Querying in blobworld

In our system, theu ser composes aqu ery by submitting animage and seeing its blobw orld representation, selecting theblobs to m atch, and finally specifying the relative importanceof the blob features. The user may also submit blobs fromseveral different images. (For example, a query might bethe disjunction of the blobs corresponding to airplanes inseveral images, in order to provide a query that looks forairplanes of several shades.)

We define an aiomic query as one which specifies aparticular blob to match (e.g., like-blob-1). A compoundquery is defined as either an atomic query or a conjunction

or disjunction of compound queries (like-blob-1 and like-blob-2). In the future, we migh t expand this definition toinclude negation (not-like-blob-1) and to allow th e user tospecify two blob s with a particular spatial relationship as anatomic query (like-blob- 1 left-of-blob-2).

Once a compound query is specified, we score eachdatabase image based on how closely it satisfies the com-pound query. Th e score pi for each atomic query (like-blob-i) is calculated as follows:

1. Find the feature vector vi for the desired blob bi. Thisvector consists of the stored color, texture, position,and shape descriptors.

2. For each blob b j in the database image:(a) Find the feature vector vj fo r b j(b) Find the M ahalanobis distance between vi

and v j using the diagonal covariance ma-trix (feature weights) set by the user:d ; j = [(vi - w ~ j ) ~ C - ~ ( v ;j)].(c) Measure the similarity between bi an d b j usingpi j = e - 2 . This score is 1 if the blobs areidentical in all relevant features; it decreases asthe match becomes less perfect.

Q

3. Take pi = maxj p i j ,The compound query score for the database image is cal-

culated using fuzzy-logic operations [131. For example, ifthe query is like-blob-1 and (like-blob-2 or like-blob-3),the overall score for the image is min(p1, max(p2, p? } } .The user can also specify a weighting cri for each atomicquery. If like-blo b-i s part of a disjunction in the com-pound query, the weighted score for atomic query i ispi = c ~ i p i ;f it is in a conjunction, its weighted score is

We then rank the images according to overall score andreturn the best matches, indicating for each image whichset of blobs provides the highest score; this informationwill help the user refine the query. After reviewing thequery results, the user may change the weighting of the blobfeatures or may specify new blobs to match.

p; = 1 - Ti (1 - i).

3.2 ResultsWe have performed a variety of queries using a se t of

2000 images from the commercial Core1 stock photo col-lection. We used the following categories: African Spe-cialty Animals; Air Shows; Arabian Horses; Bald Eagles;Bears; Canadian Rockies; Caribbean; Cheetahs, Leopards,Jaguars; China; Death Valley; Deserts; Elephants; Fields;France; Kenya; Night Scenes; Sh eep; Sunsets; Tigers; andWild Animals. Sample queries are shown in Figs. 3-4.

680


7/8

Figure 3. Blobworld query for tiger images. 28% ofthe top 50 images are tigers; tiger images make up5% o f the database.

07 - - B _

~~~~,0 5 kc- -fl-__ -81

0 30 20 1

0Figure 4. Blobworld query for zebra images. 24% ofthe to p 50 images are zebras, while less than 2% o fthe images in the database are zebras.

681

io 20 30 40 so 60 7n 80 9n 100numbu of images retiuned

Figure 5. Tiger query performance

08

"'k ,s--T-


8/8

3.2.1 Comparison to color histogramsWe have compared ou r results to those obtained using

color histogram m atching, following the procedure of Sw ainand Ballard [24]. he color histogram for each image uses8 divisions for the intensity axis and 16 for each oppo-nent color axis. Given a query imag e with histogram Q $ ,each database image (with histogram 0,)eceives scoreE,Q, - D , . As before, w e rank the database images andreturn the best matches. Figures 5-8 show how the precisionchanges as more images are returned; the blobworld queryresults are better than the c olor histog ram results, except forthe tiger query. We believe the good co lor histogram resultsfor the tiger query occur largely because of the limited natureof the test database; few non-tiger im ages in this collectionhave significant amounts of both orang e and green. Addin gpictures of, say, orange flowers in a field would degrade thecolor histogram performance without significantly affectingthe blobworld performance.

We have proposed a new method which uses Expectation-Maximization on color and texture jointly to provide animage segmentation, as well as a new im age representation(blob wo rld) which uses this segmentation and its associateddescriptors to represent im age regions explicitly. We havedemonstrated a query mechanism that uses blobworld toretrieve imag es and help gu ide user queries.Acknowledgments

We wou ld like to thank David Forsyth, Joe H ellerstein,Ginger Ogle, and Robert Wilensky for useful discussionsrelated to this work. This work was supported by an NS FDigital Library Grant (IRI 94-1 1334) and NSF graduatefellowships for Serge Belongie and Ch ad Carson.References

[ I ] S . Ayer and H. Sawhney. Layered representation of motionvideo using robust maximum-likelihood estimation of mix-ture models and MDL encoding. In Proc. Int. Con$ Comp.Vis.,pages 777-784,1995.[2] J. Bigiin. Local symmetry eatures in imageprocessing. PhDthesis, Linkoping University, 1988.[3] A. Dempster, N. aird, and D. Rubin. Maximum likeli-hood from incomplete data via the EM algorithm. J. RoyalStatistical Soc., Sex B , 39(1):1-38, 1977.[4] P. Enser. Query analysis in a visual information retrievalcontext. J. Doc. and TextManagement, 1(1):25-52, 1993.[5] W. Forstner. A framework fo r low level feature extraction.In Proc. Eu r C o n j Comp. Vis.,pages 383-394,1994.[6] D. Forsy th, J. Malik, and R. Wilensky. Searching for digitalpictures. Scientific Am erica n, 276(6):72-77, June 1997.[7] W. T. Freeman apd E. H. Adelson. The design and use ofsteerable filters. IEEE Trans. Pattern Analysis and MachineIntelligence, 13(9):891-906, 1991.

[8] J. Ggrding and T. Lindeberg. Direct computation of shapecues using scale-adapted spatial derivative operators. Int. J.Comp. Vis., 17(2):163-191, Feb 1996.[9] G. H. Granlund and H. Knutsson. Signal Processing,forComputer Vision. Kluwer Academic Publishers, 1 995.[lo] A. Gupta and R. Jain. Visual information retrieval. Comm.Assoc. Comp. Mach., 40(5):70-79, May 1997.[ l l ] T. Hofmann, J. Puzicha, and J . M . Buhmann. Deterministicannealing for unsupervised texture segmentation. In Proc.Int. Workshop on Energy Min. Methods in Conzp. Vis. an dPutt. Rec., pages 213-228, 1997.[12] A. K. Jain and F. Farrokhnia. Unsu pervised texture segmen-tation using Gabor filters. Pattern Recognition, 24( 12):1167-1186,1991.[13] J.-S. Jang,C.-T. S un, and E. Mizutani. Neuro-FuzzyandSoftComputing. Prentice Hall, 19 97.[I41 P. Kelly, M. Cannon, andD . Hush. Query by image example:the CANDID approach. In SPIE Proc. Storage an d Retrievalfor Image and Video Databases, pages 238-248,1995.[15] T. Leung and J. Malik. Detecting, localizing and groupingrepeated scene elements from an image. In Proc. Eur. Con.Conzp. Vis.,pages 546-555,1996.[16] P. Lipson, E. Grimson, and P. Sinha . Configuration basedscene classification and image indexing. In Proc. IEEEComp. Soc. Con$ Comp. Vis. and Pattern Recogn., pages1007-1 013,1997.[17] J. Malik and P. Perona. Preattentive texture discriminationwith early vision m echanisms. J. Opt. Soc. Am. A , 7(5):923-932,1990.[18] W. Wiblack et al. Th e QBIC project: query ing images bycontent using colour, texture and shape. In SPIE Proc. Stor-

age and Retrieval fo r Image and Video Databases, pages[19] V. Ogle and M . Stonebraker. Chabot: Retrieval from a re-lational database of images. IEEE Computer, 28(9):40-48,Sep 1995.[20] A. Pentland, R. Picard, and S . Sclaroff. Photobook: C ontent-

based manipulation of image databases. Int. ,I. Coinp. Vis.,18(3):233-254, 1996.[21 J. Ponce, A. Zisserman, and M . Hebert. Object Repuesen-tation in Computer Vision-II. Number 1144 in LNCS.Springer, 1996.[22] J. Rissanen. Modeling by shortest data description. Auto-

[23] U. Shaft and R. Ramakrishnan. Data modeling and queryingin the PIQ image DBMS. IEEE Data Enginee ring Bzdletira,19(4):28-36, Dec 1996.[24] M. Swain and D. Ballard. Color indexing. bit . J. Cory?. Vis.,

[25] Y. Weiss and E. Adelson. A unified mixture framework formotion segmentation: Incorporating spatial coherence andestimating the number of models. In Proc. IEEE Conzp. Soc.Con$ Conzp. Vis. andPattern Recogn., pages 321-326,1996.[26 j W. Wells, R. Kikinis, W. Crimson, and F. Jolesz. Adaptivesegmentation of M RI data. In Int. Con$ on Conzp. Vis.,Virtual Reality and Robotics inMedicine,pages 59-69,1995.[27] D. Yarowsky. Word-sense disambiguation using statisticalmodels of Rogets categories trained on large corpora . InProc. Int. Col$ Conzp. Linguistics, pages 454-460, 1992 .

173-187,1991.

nzafica, 141465-471, 1978.

7(1):11-32, 1991.

682

color- and texture-based image segmentation using em

Documents