texsom: texture segmentation using self-organizing maps

Neurocomputing 21 (1998) 7—18

TEXSOM:Texture segmentation using self-organizing maps

Javier Ruiz-del-Solar*Department of Electrical Engineering, Universidad de Chile, Casilla 412-3, Santiago, Chile

Accepted 26 May 1998

Abstract

This article describes the so-called TEXSOM-architecture, a texture segmentation architec-ture based on the joint spatial/spatial-frequency paradigm. In this architecture the orientedfilters are automatically generated using the adaptive-subspace self-organizing map (ASSOM) orthe supervised ASSOM (SASSOM) neural models. The automatic filter generation overcomessome drawbacks of similar architectures, such as the large size of the filter bank and thenecessity of a priori knowledge to determine the filters’ parameters. The quality of thesegmentation process is improved by applying median filtering and the watershed transformationover the pre-segmented images. The proposed architecture is also suitable to perform defectidentification on textured images. ( 1998 Elsevier Science B.V. All rights reserved.

Keywords: Adaptive-subspace self-organizing map (ASSOM); Supervised ASSOM (SASSOM);Joint spatial/spatial-frequency analysis methods; Gabor filters; Texture segmentation; Water-shed transformation

1. Introduction

Texture perception plays an important role in human vision. It is used to detect anddistinguish objects, to infer surface orientation and perspective, and to determineshape in 3D scenes. Even though texture is an intuitive concept, there is no universallyaccepted definition for it. Despite this fact we can say that textures are homogeneousvisual patterns that we perceive in natural or synthetic images. They are made of localmicropatterns, repeated somehow, producing the sensation of uniformity. Therefore,

*E-mail: [email protected]

0925-2312/98/$ — see front matter ( 1998 Elsevier Science B.V. All rights reserved.PII S 0 9 2 5 - 2 3 1 2 ( 9 8 ) 0 0 0 4 1 - 1

texture segmentation demands combining global analysis, for the discrimination ofdifferent textures by looking for uniformity within each one, and local analysis, for theprocessing of the basic micropatterns and the exact location of the textures’ bound-aries. The automatic or computerised segmentation of textures is a long-standing fieldof research in which many different paradigms have been proposed. Among all ofthem the joint spatial/spatial-frequency paradigm is of great interest, because it isbiologically based and because by using it, is possible to achieve high resolution inboth the spatial (adequate for local analysis) and the spatial-frequency (adequate forglobal analysis) domains. Moreover, computational methods based on this paradigmare able to decompose textures into different orientations and frequencies (scales),which allows one to characterise them.

The segmentation of textures using joint spatial/spatial-frequency analysis methodshas been used by many authors [2,4,5,7,12,13,15,19,21,22]. These methods arebased on the use of a bank of oriented filters to extract a set of invariant featuresfrom the input textures. Then, these invariant features are used to classify thetextures. In this context a typical architecture consists of the following basic stages (seeFig. 1): pre-processing (discounting of variable illumination conditions), orientedfiltering, non-linear processing (energy computation, half-wave rectification, modulecomputation, etc.), spatial local averaging, feature vectors generation, classification,and post-processing (some kind of non-linear operation). A review of somejoint spatial/spatial-frequency-based texture segmentation architectures can be foundin [14,16].

The main drawbacks of all these architectures are: the necessity of a large number offilters (normally more than 16) in the oriented filtering stage, which slows down thesegmentation process, and the a priori knowledge required to determine the filters’parameters (frequencies, orientations, bandwidths, etc.).

As mentioned before, the joint spatial/spatial-frequency analysis methods are bio-logically based, and their oriented filters model the kind of visual processing carriedout by the simple and complex cells of the primary visual cortex of higher mammals.

Fig. 1. Block diagram of a typical joint spatial/spatial-frequency-based texture segmentation architecture.In this diagram the following basic stages are shown: pre-processing (PreP), oriented filtering (filter bank[G1, Gn]), non-linear processing (NoL), spatial local averaging (A), feature vectors generation, classfication(K), and post-processing (PostP).

8 J. Ruiz-del-Solar/Neurocomputing 21 (1998) 7–18

The shape of the receptive fields of these cells and their organization are the resultsof visual unsupervised learning during the development of the visual system inthe first few months of life [23]. This example-based learning is performed throughthe act of seeing different real-world scenes, repetitively, which producesactivity dependant synaptic modification. The shape and organization of the receptivefields emerge gradually by means of the refinement of an initially diffused set ofconnections [23].

In this context, it seems natural to follow this example-based learning strategy togenerate automatically the oriented filters and to overcome the mentioned drawbacksof the joint spatial/spatial-frequency-based architectures. Different approaches havebeen used to generate automatically the mentioned filters by using neural models[10,18,20]. Among them, the adaptive-subspace SOM (ASSOM), recently proposed byKohonen [8—11], stands out because of its simplicity and biological plausibility.

The main purpose of this article is to present the so-called TEXSOM-architecture,a new texture segmentation architecture which is based on the joint spatial/spatial-frequency paradigm. In this architecture the oriented filters are automatically gener-ated using the ASSOM or the SASSOM (supervised ASSOM). Moreover, the proposedarchitecture is suitable to perform defect identification on textured images. The articleis organised as follows. The ASSOM and the SASSOM models are explained inSection 2. The proposed TEXSOM-architecture is presented in Section 3. Segmenta-tion results obtained with this architecture are shown in Section 4. Finally, inSection 5, some conclusions are given.

2. The ASSOM and SASSOM models

The adaptive-subspace self-organizing map (ASSOM) corresponds to a furtherdevelopment of the SOM architecture [10], which allows one to generate invari-ant-feature detectors. In this network, each neuron is not described by a singleparametric reference vector, but by basis vectors that span a linear subspace.The comparison of the orthogonal projections of every input vector into the differ-ent subspaces is used as matching criterion by the network. If one wants thesesubspaces to correspond to invariant-feature detectors, one must define anepisode (a group of vectors) in the training data, and then locate a representativewinner for this episode. The training data is made of randomly displaced inputpatterns. The generation of the input vectors belonging to an episode is different,depending on translation, rotation, or scale invariant feature detectors needed to beobtained [8,9,11]. In either case, the learning rule of the ASSOM architecture will begiven by [9,11]:

1. Locating the representative winner c, or the neuron in whose subspace theprojected “energy” is maximum

c(tp)"arg max

iA +

tp|S

DDxL (i)(tp)DD2B (1)

J. Ruiz-del-Solar/Neurocomputing 21 (1998) 7–18 9

with DDxL (i)(tp)DD being the orthogonal projection of the input vector x(t

p) into the

subspace generated by the basis vectors of the neuron i (b(i)h(tp), in instant t

pof the

S episode. This projection is given by

DDxL (i)(tp)DD"S+

h

Sb(i)h(tp), x(t

p)T. (2)

2. Updating the basis vectors of the representative winner and its neighbours asfollows:

b(i){h"<

tp|SAI#a(t

p)

x(tp)xT(t

p)

DDxL (i)(tp)DD DDx(t

p)DDBb(i)

h(3)

with a(tp) being the time-variable learning rate.

3. Orthonormalising the basis vectors. First, the vectors are orthogonalised byusing the Gram—Schmidt process, as follows:

b(i)A1

"b(i){1

,

b(i)Ah

"b(i){h!

h~1+j/1

(b(i){h

, b(h)Aj

)

DDb(h)Aj

DD2b(h)Aj

, h"2,2, n. (4)

Secondly, the vectors are normalised.Usually, in image processing tasks the number of basis vectors in each subspace is

two (b1

and b2). More details about the ASSOM model, such as episode generation,

data pre-processing, number of iterations, etc., can be found in [8—11].The supervised ASSOM (SASSOM), proposed in [17], works in a similar way as the

LVQ-SOM [10]. The SASSOM architecture is defined by Eqs. (1)— (4), but inEq. (3), the time-variable learning rate factor is selected positive if the input vector andthe neuron to be updated belong to the same class, and negative otherwise.

3. TEXSOM-architecture

As it was pointed out in the introduction, the proposed TEXSOM-architecturefollows the joint spatial/spatial-frequency paradigm, but it automatically generates thefeature-invariant detectors (oriented filters) using the ASSOM or the SASSOM neuralmodels. The acronym TEXSOM comes from texture segmentation using self-organizing maps

3.1. Training phase

The training is carried out using samples taken from all possible textures underconsideration. If the architecture is used to perform defect identification on texturedimages, the samples must be taken from both defect and defect-free areas. In any case,the training is performed using two stages. In the first stage, or filter generation stage,the feature-invariant detectors (filters) are generated. In the second stage, or classifier


training stage, these detectors generate invariant feature vectors, which are then usedto train a classification neural network.

3.1.1. Filter generationIn this stage the feature detectors (filters) are automatically generated using the

ASSOM or the SASSOM networks. Before the training samples are sent to theASSOM (or SASSOM) network they are pre-processed to obtain some degree ofluminance invariance (the local mean value of the vectors is subtracted from them).

The input parameters of the network (ASSOM or SASSOM) are the size ofthe filters’ mask and the number of neurons (or filters) in the network. When theSASSOM architecture is used, the number of neurons in the network must be thesame as the number of textures (or classes) taken into consideration. In this way, eachfilter is tuned with a particular texture (or class). On the other hand, when the ASSOMarchitecture is used, the number of neurons in the network can be less than, equal to,or greater than the number of textures, depending on the exactness required and onthe available processing time. Fig. 2 shows a block diagram of the architecture used toimplement this stage.

3.1.2. Classifier trainingIn this stage a LVQ network is trained with invariant feature vectors, which are

obtained using the oriented filters generated in the filter generation stage (see Fig. 3).The dimension of each feature vector is given by the number of oriented filters. Thevector components are obtained by taking the magnitude of the complex-valuedresponse of the filters. This response is given by the convolution between an inputpattern and the even and odd components of the filters, b

1and b

2, respectively. The

training is performed in two steps. First, the optimized-learning rate ¸»Q1 (O¸»Q1)training algorithm [10] is used. Then, the codewords are fine-tuned using the ¸»Q3algorithm [10]. Both algorithms were implemented by using the ¸»Q—PAK programpackage (Version 3.1) of the Helsinki University of Technology.

Alternatively, two other architectures were proposed in [17] to implement thisstage. The first architecture performs the classification using a SOM and a LVQnetwork. First, the SOM network is trained and then, the SOM codewords arefine-tuned using the ¸»Q3 algorithm (like in [24]). The inputs to both networks are

Fig. 2. Block diagram of the filter generation stage.


Fig. 3. Block diagram of the classifier training stage.

Fig. 4. Block diagram of the architecture developed to implement the recall phase.

the invariant feature vectors, obtained using the oriented filters. The second architec-ture performs the classification using a SOM and a LVQ network, both working intandem. The SOM network is first trained using the invariant feature vectors. Then,the LVQ network is trained using the coordinates of the winner neuron of the SOMnetwork, working in recall mode, as inputs.

3.2. Recall phase

As it can be seen in the block diagram of the whole architecture (see Fig. 4), after theclassification stage, a non-linear post-processing stage is used. The function of this stageis to improve the results of the pre-segmentation process. This post-processing stageperforms median filtering over the pre-segmented images and then applies the water-shed transformation over the median-filtered images.

The watershed transformation corresponds to a segmentation algorithm, originallyproposed by Beucher [1], which is based on the analogy between the two processes ofgreyscale image segmentation and flooding of a topographic surface. Before applyingthe watershed transformation, two images must be defined: a label image, whichcontains the image minima and defines a pre-segmentation of the original image, andan edge image, which contains border lines and that can be obtained using an edge


Fig. 5. Block diagram of the non-linear post-processing stage.

Fig. 6. In the top/bottom image are shown the filters generated using the ASSOM/SASSOM configuredwith 9/4 neurons and a filter mask of 15]15 pixels (left: filters’ component b

1; right: filters’ component b

2).

detector operator. In our application, both the label image and the edge image areobtained from the median filtered image (see Fig. 5). The label image is the result ofthe application of the emergent segment opening operator, using a structuring elementwith a size of 11 pixels, while the edge image results from applying the morphologicalgradient, using a structuring element with a size of three pixels.

4. Segmentation results

As a preliminary test, four textures from the Brodatz album [3] (textures D17, D20,D34 and D52) were used to train the networks. Both the ASSOM and the SASSOM


Fig. 7. An image composed by four Brodatz textures (top left), the pre-segmented image (top right), themedian filtered image (bottom left), and the resulting image of the watershed transformation (bottom right).

were applied to implement the filter generation stage. The ASSOM network wasconfigured using nine neurons. The SASSOM network, on the other hand, wasconfigured using only four neurons, the same as the number of textures. The size ofeach filter mask was 15]15 pixels. The filters generated by the ASSOM and theSASSOM architectures are shown in Fig. 6.

In the implementation of the classifier training stage, the proposed architecturereached a recognition rate of 95%. This performance was achieved when the orientedfilters were generated either by the ASSOM or by the SASSOM networks. It isimportant to take into account that, in the SASSOM network, only four filters were


Fig. 8. A real-world image composed by four different textiles (top left), the presegmentation image (topright), the median filtered image (bottom left), and the resulting image of the watershed transformation(bottom right).

used. In Fig. 7, the segmentation of a test image, composed by the four Brodatztextures, is shown.

As a second test, the architecture was used on the segmentation of a real-worldimage composed by four different textiles (see Fig. 8). As it can be seen in this example,the architecture performed a good segmentation in this real-task even though itscomplexity. Even for human beings, this image is very difficult to segment.

The defect identification capabilities of the TEXSOM-architecture are shown inFig. 9. It can be observed that defects, which are also difficult for us to see in thetextured image, can be easily distinguished in the pre-segmented one. By usinga subsequent classification stage, the defects present in the pre-segmented image canbe detected.


Fig. 9. Brodatz texture D53 (left) and its pre-segmented image (right).

5. Conclusions

As it was shown in Section 4, the TEXSOM-architecture is suitable to performtexture segmentation and also defect identification on textured images. Even real-world images can be segmented by using this architecture. In spite of this fact, thearchitecture can still be improved in different aspects. Some of them are:

f The performance and applicability of the TEXSOM-architecture can be increasedby using only the even component (b

1) of the filters in the classifier training stage

and in the recall phase, and by selecting automatically the size of the filter masksand using filter masks with variable sizes.

f The generalisation properties of the whole architecture can be improved by testingother kinds of classification neural networks, as for example, Fuzzy-¸»Q orFuzzy-AR¹MAP, configured with different number of neurons. It is also interest-ing to analyse other kinds of non-linear post-processing operators, as for example,mode filters, morphological filters, and other kind of rank order filters (e.g. multistagemedian filters, max/min median filters, weighted median filters, stack filters).


f Finally, the pre-processing stage can be enhanced by using a biological inspiredalgorithm, instead of the mean-value subtraction operation, to obtain some degreeof luminance invariance. The shunting network proposed by Grossberg [6] can beused to achieve that. This network is based on the kind of neuronal processingperformed by the on-center and off-center ganglien cells of the retina.

Acknowledgements

The author thanks the anonymous reviewers for their valuable technical adviceand, especially, the editors of this special issue for their assistance and support.

References

[1] S. Beucher, Watersheds of functions and picture segmentation, IEEE Int. Conf. on Acoustics, Speechand Signal Processing, Paris, 1982, pp. 1928—1931.

[2] A.C. Bovik, M. Clark, W. Geisler, Multichannel texture analysis using localized spatial filters, IEEETrans. Pattern Analysis Mach. Intell. 12 (1) (1990) 55—73.

[3] P. Brodatz, Textures — A Photographic Album for Artists and Designers, Dover, New York, 1990.[4] D. Dunn, W. Higgins, J. Wakeley, Texture segmentation using 2-D Gabor elementary functions,

IEEE Trans. Pattern Anal. Mach. Intell. 16 (2) (1994) 130—149.[5] H. Greenspan, R. Goodman, R. Chellappa, C.H. Anderson, Learning texture discrimination rules in

a multiresolution system, IEEE Trans. Pattern Anal. Mach. Intell. 16 (9) (1994) 894—901.[6] S. Grossberg, The quantized geometry of visual space: the coherent computation of depth, form, and

lightness, Behavioral Brain Sci. 6 (1983) 625—657.[7] A.K. Jain, F. Farrokhnia, Unsupervised texture segmentation using Gabor filters, Pattern Recogni-

tion 24 (12) (1991) 1167—1186.[8] T. Kohonen, The adaptive-subspace SOM (ASSOM) and its use for the implementation of invariant

feature detection, Proc. Int. Conf. on Artificial Neural Networks — ICANN 95, 9-13 October, Paris,1995.

[9] T. Kohonen, Emergence of invariant-feature detectors in self-organization, Proc. Int. Conf. on NeuralNetworks — ICNN 95, 27 November 1 December, Perth, 1995.

[10] T. Kohonen, Self-Organizing Maps, 2nd ed., Springer, Heidelberg, 1997.[11] T. Kohonen, Emergence of invariant-feature detectors in the adaptive-subspace self-organizing map,

Biol. Cybernet. 75 (4) (1996) 281—291.[12] S. Lu, J. Hernandez, G. Clark, Texture segmentation by clustering of Gabor feature vectors, Proc. Int.

Conf. on Artificial Neural Networks I, Seattle, 1991, pp. 683—687.[13] J. Malik, P. Perona, Preattentive texture discrimination with early vision mechanisms, J. Opt. Soc.

Amer. A. 7 (5) (1990) 923—932.[14] R. Navarro, A. Tabernero, G. Cristobal, Image representation with Gabor wavelets and its applica-

tions, in: P.W. Hawkes (Ed.), Advances in Imaging and Electron Physics 97, Academic Press, SanDiego, CA, 1996.

[15] P.M. Palagi, A. Guerin-Dugue, An architecture for texture segmentation: from energy features toregion detection, Proc. Int. Workshop on Artificial Neural Networks, Malaga, 1995, pp. 956—962.

[16] T. Reed, J. Du Buf, A review of recent texture segmentation and feature techniques, CVGIP: ImageUnderstanding 57 (3) (1993) 359—372.

[17] J. Ruiz-del-Solar, M. Koppen, A texture segmentation architecture based on automatically generatedoriented filters, J. Microelectro. Systems Integration 5 (1) (1997) 43—52.

[18] T.D. Sanger, Optimal unsupervised learning in a single-layer linear feedforward neural network,Neural Networks 2 (6) (1989) 459—473.


[19] J. Shao, W. Forstner, Gabor wavelets for texture edge extraction, ISPRS Comm. III Symp. 94,Munich, 1994, pp. 745—752.

[20] J. Sirosh, A self-organizing neural network model of the primary visual cortex, Ph.D. Thesis, TheUniversity of Texas at Austin, 1995.

[21] A. Teuner, O. Pichler, B.J. Hosticka, Unuberwachte selektion und abstimmung von dyadishengaborfiltern zur textursegmentierung, Proc. 16. DAGM-Symp.-Mustererkennung 1994, pp. 296—303.

[22] M.R. Turner, Texture discrimination by Gabor functions, Biol. Cybernet. 55 (1986) 71—82.[23] R.C. Van Sluyters, J. Atkinson, M.S. Banks, R.M. Held, K.-P. Hoffmann, C.J. Shatz, The development

of vision and visual perception, in: L. Spillman, J. Werner (Eds.), Visual Perception:: The Neuro-physiological Foundations, Academic Press, San Diego, CA, 1990.

[24] A. Visa, K. Valkealahti, O. Simula, Cloud detection based on texture segmentation by neural networkmethods, Proc. 1991 Int. Joint Conf. on Neural Networks, Singapore, 1991, pp. 1001—1006.

Javier Ruiz-del-Solar was born in 1968. He received his diploma in ElectricalEngineering and the M.S. degree in Electronic Engineering from the TechnicalUniversity Federico Santa Maria (Chile) in 1991 and in 1992, respectively, and theDoctor-Engineer degree from the Technical University Berlin in 1997. FromMarch 1993 to February 1998 he worked as researcher engineer at the Fraun-hofer-Institut IPK Berlin. Recently, he became an Assistant Professor of ElectricalEngineering at the Universidad de Chile. His research interests include applicationof neural network technology to computer vision and pattern recognition prob-lems, computational models of neurons and neurobiological systems, textureanalysis and fuzzy-based retrieval of information on image databases. He ismember of the IEEE since 1988 and member of the Consultants Pool of theEC»Net (European Computer Vision Network) since 1996.


texsom: texture segmentation using self-organizing maps

Documents