category pruning in image databases using segmentation … · a novel framework for pruning a...

CATEGORY PRUNING IN IMAGE DATABASES USING SEGMENTATION ANDDISTANCE MAPS

Baris Sumengen, B. S. Manjunath

ECE Department, UC, Santa Barbara93106, Santa Barbara, CA, USA

email:{sumengen,manj}@ece.ucsb.eduweb: vision.ece.ucsb.edu

ABSTRACTA novel framework for pruning a category of images is pro-posed in this paper. We assume no prior information aboutthe contents or semantics of the images. Our frameworkdiscovers consistencies and knowledge about the spatial re-lations of the categories unsupervised using iterative imagesegmentation and spatial grouping. A measure for decidinghow well an image fits to a category is proposed and the ef-fectiveness of this measure is investigated.

1. INTRODUCTION

With the increasing use of digital cameras, growth of the In-ternet and high importance of periodic acquisition of aerialand satellite images, image databases are becoming more andmore important every year. The world wide web (www) it-self probably contains a majority of the images in existenceand these images can be assigned text captions and keywords,which makes creating image databases from web images aninteresting and challenging task. Most of the image data-bases are or can be divided into semantic subgroups, whichwe call categories. One example for a category is a directoryof photographs in a home user’s computer. Another exam-ple of a category is the group of images in an internet imagedatabase that has the keyword “car” associated to them. Un-fortunately most image databases are not generated with thesemantic categorization in mind.

In our previous work [1], we show an efficient way tospider and simultaneously categorize web images for contentbased retrieval. On this database of over 600,000 images, weachieve impressive retrieval results by exploiting the cate-gory structure. For example, the “Cars” category consists ofimages spidered from car related web sites. About 60-80% ofimages in this category are images of cars. In addition, thereare also other pictures that are not of cars on such pages,e.g., picture of the owner of a car dealership. This is a com-mon problem with automatically generated categories. To in-crease the overall effectiveness of retrievals, further pruningis needed. Pruning will improve both browsing and content-based retrieval quality of these databases. The main objec-tive of the pruning is increasing the precision of the categorywhile preserving the recall. By using our graph partitioningactive contour segmentation method to separate backgroundfrom foreground, we demonstrate that good image segmen-tation helps in this pruning step and improves the overall re-trieval performance.

Image segmentation has been used in image databases asa tool to limit extraction of image features to the segments ofinterest. These segment-based feature vectors are then usedfor image or region search in a content-based image retrieval

system [2, 3] and for browsing image databases [4]. In thispaper we propose a new way of utilizing segmentation forimage databases. The objective of our work is to improvethe quality of both image segmentation and image databaseretrieval simultaneously.

In developing a pruning strategy using segmentation, wemake the following assumptions: 1) We do not have any do-main specific knowledge, or object models about the cate-gory. 2) The category is automatically generated, but hassome consistency such that large number (more than 50%) ofthe images in this category fit to a certain unknown semanticconcept. 3) Images can vary in terms of image features orshape of the objects even if they follow the category concept.4) Images can vary in their sizes. In our case, the images arebetween 128×128 and 512×512.

We propose a solution to this problem by discovering aloose spatial relationship between the background and fore-ground of the images. The labelsbackgroundandforegroundhave no importance in our method but are used for con-sistent association of regions among category images. Wefirst segment all images into foreground and background re-gions using graph partitioning active contours (GPAC) [5,Chapter 5]. Using these segmentations, a signed distancemap is calculated for each image based on the segmenta-tion boundary. Then, these distance maps are averaged tofind a distance map for the whole category. A continuousforeground/background spatial relationship within the cate-gory is captured by this average distance map. After that,the images are re-segmented, but this time the segmentationcost function also incorporates the spatial relationships dis-covered from the previous iteration. We show that this stepimproves both the segmentation of certain images and thedistance map itself.

The pruning of the category is achieved by comparingthe individual distance maps to the average distance map.We show the distribution of this comparison and interpretthe results by showing specific examples. In addition, pre-cision recall curves are drawn for visualization of how wellour method works.

2. SEGMENTATION FRAMEWORK

We use segmentation as two-region (background and fore-ground) partitioning of an image. We will show in Section3 situations where multi-region (3 or 4 regions) segmenta-tion might be necessary. Our segmentation approach, GPAC,is based on grouping similar points as foreground and back-ground while increasing the dissimilarity between these re-gions. One important characteristic of GPAC is its flexibilityin defining the segmentation cost function. By controlling

(a) (b)

Figure 1: a) Initialization of the curve evolution for imagesegmentation. b) Average distance map.

the similarity measure, we can change and adapt the seg-mentation behavior to the problem at hand. We calculate thedissimilarities as:

w(pi , p j) = ‖~F(pi)−~F(p j)‖+α‖pi − p j‖ (1)

where~F(pi) is some low level image feature at pointpi (weuse color) and the second term measures the spatial distancebetween two points. Based on this dissimilarity metric, thesegmentation functional that we maximize is given by:

E =∫∫

Ri(C)

∫∫Ro(C)

w(p1, p2)dp1dp2 (2)

whereC is a closed curve on the image domain,Ri is the in-terior andRo is the exterior ofC. Conceptually this meansfinding the curve (partitioning of the image) for which thedissimilarity between its interior (foreground) and exterior(background) is maximized. This problem is solved by start-ing with an initial curve and maximizing (2) using steep-est descent. After adding geometric constraints such as areanormalization and boundary length minimization, the corre-sponding curve evolution equation becomes:

∂C∂ t

=(

1|Ro|

∫∫Ro(C)

w(c, p)dp− 1|Ri |

∫∫Ri(C)

w(c, p)dp

)~N−γκ~N

(3)This result can be visualized as the competition of fore-ground and background to push the curve towards the opti-mum boundary. Fig. 1a shows the curve initialization used inthis paper. We initialize one multi-part curve with 16 (4×4)sub-parts, so that the curve can cover most of the image area.

3. PRUNING A CATEGORY

The pruning strategy is based on first segmenting the imagesin a category. The signed distance of the pixels from the seg-mentation boundary is used as a measure for spatial relationswithin the images. By averaging image level spatial rela-tions, we discover category-wide spatial relations and pro-pose a measure for calculating the association of images totheir corresponding categories.

The images in the categories are of various sizes between128×128 and 512×512. Before segmenting these images,we resize them proportionally such that the smaller dimen-sion of the image is mapped to 64. This small size is se-lected for computational reasons so that large number of im-ages can be segmented quickly. After segmenting and ex-tracting the boundaries, we generate a signed distance map.This map is calculated as the distance of each pixel to the

segmentation boundary. The value is positive if the pixelbelongs to the foreground and negative if the pixel belongsto the background. Before calculating this map, we need todecide which region is background and which region is theforeground. This is important for consistency among the im-ages. So, we calculate the average distance from the imageboundaries (not segmentation boundary) for each region. Wedefine the foreground as the region with higher distance fromthe image boundaries. After calculating the signed distancemap, we linearly scale the values between 0 to 255. Then us-ing standard image resizing algorithms the distance maps areresized to the sizeN×N. We selectN = 100 in our experi-ments. The distance maps of images are averaged pixel-wiseto create the average distance map of the category. We usethis average map as the representation of the category.

After calculating the average map, we re-segment all im-ages by using a new dissimilarity measure that incorporatesspatial relations discovered in the previous step.

w(pi , p j)= ‖~F(pi)−~F(p j)‖+α‖pi−p j‖+β‖m(pi)−m(p j)‖(4)

wherem is the average distance map. The constantβ is se-lected such that segmentation is not completely biased to thisnew term. After a second iteration of the segmentations, anew average distance map is generated. We now select thisnew distance map as the representation of the category’s spa-tial relations.

To make decisions for eliminating images from the cat-egory, we need to check if an image follows the concept ofthe category or not. To create a measure of how well an im-age fits to a category, we calculate the distance between theindividual distance maps and the average distance map. Dis-tances are calculated using:

‖mavg−mi‖ =√

∑(mavg(p)−mi(p))2/N2

After calculating the distances, a cutoff pointdcut needsto be estimated such that images with higher distance val-ues are discarded from this category. One way is to analyzethe shape of the histogram and make decisions. Simple deci-sions such as eliminating the tail of the histogram improvesthe quality of the category but this type of ad hoc decisionsmight be suboptimal. A better approach would be using alearning framework [6] to discover a methodology for decid-ing on dcut. This may require generating ground truth forlarge number of categories.

4. EXPERIMENTAL RESULTS

The category we experiment with isSports: Winter Sports:Skiing: Guides: North America: United States. The maintheme for the images in this category (79%–110 out of 139)is that a skier is either posing for a photograph or skiing orsnowboarding downhill when the picture is taken. Theseimages vary significantly in terms of foreground and back-ground color, environment and pose. The rest of the imagesconsist of scenic views of snowy mountains, maps, pictureof jackets and backpacks, and a set of unrelated images. Fig2 shows examples of characteristic images and Fig. 3 showsexamples of uncharacteristic images for the category.

First, images are resized such that the smaller dimensionof the image is mapped to 64. After that, all images are seg-mented using GPAC. Segmentation results for some images

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

Figure 2: First column shows the original image. Foregroundis shown in second column. Third column show the signeddistance map.

(a) (b) (c)

(d) (e) (f)

Figure 3: First column shows the original image. Foregroundis shown in second column. Third column show the signeddistance map.

(a) (b) (c)

(d) (e) (f)

Figure 4: First column shows the original image. Initial fore-ground is shown in second column. Foreground after usingspatial relations is shown in third column.

are shown in the second column of Fig. 2 and Fig. 3. Af-ter extracting the boundaries, the signed distance maps aregenerated. Distance maps for some images are shown in thethird column of Fig. 2 and Fig. 3. The average distance mapfor the category is shown in Fig. 1b. As can be seen, theaverage distance map follows this category’s spatial relationssuch that the foreground objects are in the middle part of theimages in an upright position.

Using the average map, images in this category are re-segmented as discussed in Section 3. Several examples areshown in Fig. 4. New foregrounds in the third column showthat the segmentation process has incorporated the spatial re-lations discovered from the category. The observation re-garding the new average distance map is that this averagedistance map is not much different than the previous one.The reason for this is that high percentage of the images fol-low the category characteristics (average distance map) andsegmentations for these images did not change. This and ourexperiments also show that more iterations of segmentationsare not necessary.

After the second iterations of the segmentations, dis-tances of individual distance maps to the average distancemap are calculated. In Fig. 5, the first row shows 4 imageswith the smallest distances and second row shows the im-ages with the largest distance values. Fig. 6 (a) shows thehistogram of distance values for this category. It can be seenfrom the figure that there are three peaks that are separated at0.55 and 0.7. After the third peak there are outliers startingfrom 0.78 and up. We observe that the images within the firstpeak are following the category theme very well. Within thesecond peak, it is a combination of good and bad images forthe category. Fig. 7 shows examples of images that fall intothe second and third peak. Images in Fig. 7 (a-d) show thatthe pose of the skiers are not vertical unlike most other im-ages. Adding some level of rotation invariance to our methodwould be helpful for these images. Fig. 7 (e-f) show twoskiers using the lift. The problem with this image is that theskiers are not located in the middle part of the image whilemost category images have skiers in the middle parts. Tohandle this kind of images, our method needs to incorporate

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 5: a-d) Images with smallest distances. e-h) Imageswith the largest distances.

0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

2

4

6

8

10

12

(a)

0 10 20 30 40 50 60 70 80 90 100

80

82

84

86

88

90

92

94

96

98

100

Recall

Pre

cisi

on

(b)

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3

80

82

84

86

88

90

92

94

96

98

100

Pre

cisi

on

(c)0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3

0

10

20

30

40

50

60

70

80

90

100

Rec

all

(d)

Figure 6: a) Histogram of the distances. b) Precision-Recallcurve. c) Precision vs. cutoff point. d) Recall vs. cutoffpoint.

shift invariance. Last example is in Fig. 7 (g-h). In this ex-ample, the background consists of two different regions, thesnow and the darker background corresponding to the trees.Two region segmentation is not able to handle this situationand part of the background is labeled as foreground. A 3 re-gion or 4 region segmentation (can be achieved by recursivebi-partitioning) would be helpful when handling more com-plex images.

Now suppose we decide on a cutoff pointdcut for thedistances, such that images with higher distance values arediscarded from this category. Fig. 6(b) shows the precision-recall1 graph for cutoff points starting with 0.25 and withincrements of 0.01 up to 1.3. At 1.3, nothing is discarded,so we have 100% recall and 79% precision. We see from

1After discarding images with distances bigger than a cutoff point, pre-cision measures the percentage of good images within the remaining imagesin the category, and recall measures the percentage of the good images thatare still left in the category.

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 7: Images that could be misclassified.

the graph that cutoff point 0.71 (red circle in the graph) isoptimal with 96% recall and 87% precision. At this point,5 out of 110 good images (4.5%) and 13 out of 29 bad im-ages (45%) are discarded. We see that the precision of thecategory improves by 8% from 79% to 87%.

5. DISCUSSION

We have proposed a simple method for category pruning anddiscovering spatial relations in image databases. The signeddistance maps created from each image can be thought of anew feature for the image. Well known image features suchas color and texture can also be used together with the dis-tance map when capturing a concept for a category. It ispossible that in some categories there is no coherence withinimages in terms of color whereas the spatial relations are con-sistent. For other categories the opposite may be true. Be-cause of the variety of images in different categories, morecomplex analysis of images might be necessary. Moreover,rotation invariant, scale invariant and shift invariant spatialrelations or a weighted combination of these might improvepruning results for certain categories.

Acknowledgements: This work was supported by theNational Science Foundation award NSF ITR-0331697.

REFERENCES

[1] S Newsam, B Sumengen, and B S Manjunath. Category-based image retrieval. InICIP, pages 596–9, 2001.

[2] W. Y. Ma and B. S. Manjunath. Netra: a toolbox fornavigating large image databases. InICIP, pages 568–71, 1997.

[3] C. Carson, S. Belongie, H. Greenspan, and J. Ma-lik. Blobworld: image segmentation using expectation-maximization and its application to image querying.IEEE PAMI, pages 1026–38, Aug 2002.

[4] B. S. Manjunath and W. Y. Ma. Browsing large satelliteand aerial photographs. InICIP, pages 765–8, 1996.

[5] Baris Sumengen.Variational image segmentation andcurve evolution on natural images. PhD thesis, UC,Santa Barbara, Sep 2004.

[6] Vladimir Cherkassky and Filip Mulier. Learningfrom Data: Concepts, Theory, and Methods. Wiley-Interscience, March 1998.

category pruning in image databases using segmentation … · a novel framework for pruning a...

Documents