intelligent systems for signal and image understanding

Signal Processing 32 (1993) 1-4 1 Elsevier

Editorial

Intelligent systems for signal and image understanding

Vito Roberto* (Member EURASIP) Dipartimento di Matematiea e Informatica, Universitdt di Udine, Via Zanon, 6, 1-33100 Udine, Italy

Intelligent system refers commonly to a set of hardware/software modules aimed at emulating human activities, such as the perceptual ones; perception, in turn, is a set of processes by which a system, on the basis of sensorial inputs, constructs and maintains internal representations of the environment (world). The vagueness of such defin- itions, which are widely adopted in the literature, does not allow us to formally characterise intelligent perceptual systems, nor to appreciate the complexity of the tasks involved. On the other hand, since little consensus exists on more specific views, any attempt to provide one such view is subject to criticisms. However, we believe that a scientific debate is timely and helpful to such a rapidly growing research field; this special issue of Signal Processing is a contribution devoted to this aim.

Let us briefly discuss some key aspects of intelligent systems for signal and image understanding.

Domains of interest. A consolidated research activity exists traditionally for visual [10, 20] and auditory [8, 19, 23] functionalities. More recent fields concern signal interpretation in applicative domains, such as biomedicine [16], industry [21], defence [22]; multisensor data fusion [14]; senso- motorial coordination in advanced robotics [1]; man machine dialogue by multi-medial interfaces [13] ; perceptual learning [3, 7].

Correspondence to: Prof. Vito Roberto, Dipartimento di Informatica, Via Zanon, 6, 1-33100 Udine, Italy.

* Guest Editor.

The perceptual challenge. Maintaining a flux of useful information with the environment through sensors is a formidable task, because of the variability of patterns typically involved; the ubiquit- ous noise in emission, transmission, acquisition; the ambiguities arising from multiple, overlapping or conflicting sources ;, the large volume of data to be processed. But the key point is that sensory data alone are not sufficient to constrain perceptual problems; a system must be given additional information about the world and possible interactions with it [ 17].

Generic functional requirements. As a conse- quence of the last statement above, intelligent perceptual systems must infer useful descriptions from sensory data, by means of models encoding specific aspects of the world (e.g. some of its structural regularities) and the interaction with it (e.g. how regularities are captured by the acquisition system). Inference is given here an extended meaning, as a set of generic actions to add new pieces of information to an input set.

Variability requires flexible architectures and control schemes; noise requires stability and robustness; ambiguities require formal treatment of domain-specific uncertainties (e.g. occlusions or interference of multiple sources); handling large amounts of data favours distribution of tasks and/ or parallelism.

Approaches and models. A model is composed by a set of computational primitives; a set of prescrip- tions to combine primitives and generate descriptions; a set of processes to manipulate descriptions,

0165-1684/93/$06.00 © 1993 Elsevier Science Publishers B.V. All rights reserved

1I.. Roberto / Editorial

with the aim of emphasizing specific aspects of the world.

In order to build models it is helpful to study the structure, behaviour and performances of biological systems (and humans, in particular); as a matter of fact they do solve perceptual problems efficiently. Neurophysiology and perceptual psych- ology [4, 15] provide descriptions of mechanisms, identify levels of complexity and limitations which turn out to be useful, although artificial systems-which operate on basically different physical supports- need not reproduce all of the 'natural' clues.

The designer has at his/her disposal a number of formal approaches.

Numerical/statistical frameworks, as derived from signal/image processing and statistical pattern recognition, provide efficient tools to simulate perception (especially in its first stages), by yielding essentially local descriptions of physical/spatial properties. Part models [2], used for visual tasks, adopt primarily geometrical primitives (e.g. planes, cylinders) to describe objects at an intermediate, non-local scale. Connectionist models [7] are acquiring more and more importance in tasks such as learning, classification, pattern association: perceptual functionalities are expected to emerge from the structural properties of the model itself (see [11], for example). Symbolic approaches, such as syntactic pattern recognition, graph-based and logic-based formalisms are helpful whenever abstract pictures of the world can be achieved and used, for example, to increase robustness. The knowledge-based approach provides architectures and tools to build models out of heterogeneous sources of information in an explicit fashion. Finally, hybrid formalisms [5] may arise from the combination of different representational schemes.

In summary, an intelligent perceptual system can be only characterised on functional grounds, as one inferring descriptions of the variable world in a model-based fashion, thereby conveying meaning (semantics) in sensory data.

The well-known property of a complex system, whose functionalities and behaviour do not trivially Signal Processing

emerge from those of its constituents, and the dramatic need of robustness in perceptual fields, as derived by the experience now available [12], lead us to consider integration of information as a pri- mary concern, to be addressed in several forms. - Data fusion: multiple sensory sources are to be

correlated either statistically or logically; moving sources provide data to be dynamically fused. Models of the world may be originated directly by data fusion. More general forms of pattern aggregation may occur: for example, grouping mechanisms may work until meaning- ful structures are reached, simulating pre- attentive steps in human perception; the same structures may be used as primitives for subse- quent (attentive) steps, aimed at refining world descriptions [ 18].

- Process coordination: several processing levels may be identified, according to the grain size of the patterns/objects involved; each level is given local control schemes (hierarchical/eterarchical, sequential/parallel . . . . ). A global scheme aggre- gates all control levels in a unique framework. The organisation of control may simulate atten- tional mechanisms in human perception [6, 9], or planning capabilities to drive actions [14].

We assumed integration as the basic viewpoint to analyse intelligent systems; discussing data fusion and process coordination provides clues to characterise such systems.

We now outline the organisation of this volume.

Part 1. Basic issues J.L. Crowley and Y. Demazeau address the

problem of maintaining descriptions of a dynam- ical world, and provide a unifying framework for fusion of both numerical and symbolic information. The authors identify a cyclic scheme under- lying fusion, consisting of three basic processes (predict, match, update), and discuss a number of systems designed according to the scheme. V. Roberto reviews the knowledge-based approach; the focus is on methodological steps to build models, as well as solutions to perform integration at the data and control levels. Aspects

V. Rober to ' Editorial 3

emerge to better understand potentialities, practi- cal realisations and current limitations.

Part 2. Tools and techniques This part concerns task-specific techniques to

build internal models which should be included in more complex frameworks, as well as general-pur- pose knowledge representations to support the design of specialised systems.

M.J.L. Orr presents an algorithm to perform fusion in the presence of noise and occlusions, and in a moving environment: the internal model results from dynamically fusing geometrical parts (lines and planes are used as primitives) by means of statistical estimates. E. Trucco describes a technique to infer a part model (a 3-D volumetric description) of an object, starting from input data in form of parallel cross-sections from range sensors; the model is built in two steps: segmentation (on the basis of perceptual cues) and model fitting. Similarly, E. Marti, J. Reginc6s, J. L6pez-Krahe and J.J. Villanueva propose a technique to infer an Origami world description (still belonging to the class of part models) of 3-D objects, from a 2-D projection in form of a line drawing. The model copes with hidden lines, thereby accounting for ambiguous interpretations, and is built out of feature extraction and feature interpretation steps.

F. Kummert, H. Niemann, R. Prechtel and G. Sagerer provide an extensive account of a signal- understanding software environment, designed according to the knowledge-based paradigm. A hybrid knowledge model, based on the formalism of semantic networks, fuses symbolic with numerical representations. A global control scheme allows for a suitable combination of data- and model- driven control strategies. Being problem indepen- dent, the model has been successfully adopted in different domains, such as interpretation of image sequences or continuous speech.

Part 3. Real-world applications This part includes those contributions which

mainly concern real-world applications of perceptual systems.

A. Del Bimbo, L. Landi and S. Santini describe a system for detecting road directions from image sequences of outdoor scenes. The system integrates two image-processing modules, for edge extraction and segmentation respectively, with a feedback neural network, which provides an adaptive and robust classifier. P. Chauvet, J. Lopez-Krahe, E. Taflin and H. Maitre address the problem of archiving complex documents. They propose an integrated approach making use of segmentation and recognition steps to extract and classify the semantically relevant blocks in a document; an internal model is generated, whose descriptions, besides providing interpretive cues, also allow for an adaptive and more precise coding.

L.S. Kung and J.C. Samin propose a redrawing system to convert an input engineering drawing, considered as a 2D image, into a symbolic (graph) representation, which, in turn, is used to extract the geometrical primitives of a CAD model.

The knowledge-based approach is proposed and applied by S. Molander and H. Broman to the analysis of ultrasound and gammacamera images of the heart. In particular, segmentation and label- ling are iterated and refined until consistency is reached with a priori knowledge models.

Integration of multiple, noisy and ambiguous sources is the subject matter of the paper by G.L. Foresti, V. Murino, C. Regazzoni and G. Vernazza, and is accomplished by means of hierarchical, distributed control and symbolic reason- ing. The task is the interpretation of 3D real outdoor scenes by multisensory images, in the con- text of autonomous driving.

Finally, V. Neagoe reports, in a short commu- nication, the results of combining periodogram estimation with pattern recognition techniques to the detection of FSK signals.

The preparation of this volume has received an invaluable support from the enthusiasm and the amount of experience now available in European Institutions and Laboratories in perceptual domains. We gratefully acknowledge the effort of the contributors (20 submissions) as well as the

Vol. 32. Nos. I 2, May 1993

V.. Roberto / Editorial

collaboration from the 43 colleagues who helped the review process. We were left with 12 papers by 28 authors from 8 European Countries.

We gratefully acknowledge also the continuous encouragement from EURASIP, and, in particular, from Murat Kunt (Lausanne), Editor in Chief of Signal Processing.

We look forward to future initiatives, which may open new opportunities to exchange ideas and results in such a challenging research field.

References

[1] N. Ayache and O.D. Faugeras, "Maintaining representations of the environment of a mobile robot", Robotics Research 4, MIT Press, Cambridge, MA, 1988 pp. 337-350.

[2] D.H. Ballard and C.M. Brown, Computer Vision, Prentice Hall, Englewood Cliffs, NJ, 1982.

[3] L.B. Booker, D.E. Goldberg and J.H. Holland, "Classifier systems and genetic algorithms", Artificial Intelligence, Vol. 40, 1989, pp. 235-282.

[4] V. Bruce and P. Green, Visual Perception. Physiology, Psy- chology and Ecology, Lawrence Erlbaum, London, 1985.

[5] H. Bunke, "Hybrid methods in pattern recognition", in: P.A. Devijver and J. Kittler, eds., Pattern Recognition Theory and Applications, NATO-ASI Series, Vol. F30, 1987, pp. 367-382.

[6] P. Burr, "Attention mechanisms for vision in a dynamic world", Proc. 9th Internat. Conf. on Pattern Recognition, 1988, pp. 977-987.

[7] G. Carpenter and S. Grossberg, "Pattern recognition by self-organizing neural networks", MIT Press, Cambridge, MA, 1991.

[8] R. De Mori, "Knowledge-based computer recognition of speech", in: P.A. Devijver and J. Kittler, eds., Pattern Recognition Theory and Applications, NATO-ASI Series, Vol. F30, 1987, pp. 433450.

[9] K. Fukushima, "A neural network model for selective attention in visual pattern recognition", Biological Cyber- netics, Vol. 55, 1986, pp. 5-15.

[10] W.E.L. Grimson and D.P. Huttenlocher, eds., IEEE Trans. Pattern Anal. Machine Intell., Special Issue on Interpretation of 3-D Scenes; Part I, Vol. PAMI-13, No. 10, 1991 ; Part II, Vol. PAMI-14, No. 2, 1992.

[11] S. Grossberg and E. Mingolla, "Neural dynamics of sur- face perception", Comput. Vision Graph. linage Process., Vol. 37, 1987, pp. 116-165.

[12] R.C. Jain and T.O. Binford, "Ignorance, myopia, and na'ivet6 in computer vision systems", Comput. Vision, Graph. Image Process., Image Understanding, Vol. 53, 1991, pp. 112 117.

[13] S. Levialdi, ed., Proc. Internat. Workshop on Advanced Visual Interfaces, Rome, I, 27-29 May 1992 (to appear).

[14] R.C. Luo and M.G. Kay, "Multisensor integration and fusion in intelligent systems", IEEE Tram. Systems Man Cybernet., Vol. SMC-19, No. 5, 1989, pp. 901-931.

[ 15] T. Myers, J. Laver and J. Anderson, eds., The Cognitive Representation of Speech, Elsevier (North-Holland), Amsterdam, 1981.

[ 16] H.Niemann, ed., Pattern Recognition Letters, Special Issue on Expert Systems in Medical Imaging, Vol. 8, No. 2, 1988.

[17] A.P. Pentland, "Introduction - from pixels to predicates", in: A, Pentland, ed., From Pixels to Predicates, Ablex, Norwood, NJ, 1986, pp. VIII-XVIII.

[18] A.P. Pentland, "Perceptual organisation and the representation of natural form", Artificial Intelligence, Vol. 28, 1986, pp. 293-331.

[ 19] L.R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition", Proc. IEEE, Vol. 77, No. 2, 1989, pp. 257-286.

[20] G. Sandini, ed., Proc. Second European Conf. on Computer Vision, Springer, Berlin, 1992.

[21] J.L.C. Sanz, ed., IEEE Trans. Pattern Anal Machine Intell., Special Issue on Industrial Machine Vision and Computer Vision Technology; Part I, Vol. PAMI-10, No. 1, 1988; Part II, Vol. PAMI-10, No. 3, 1988.

[22] R.L. Simpson, Jr., "Computer vision", IEEE Expert, August 1991, pp. 11 15.

[23] A. Waibel, T. Kanazawa, G. Hinton, K. Shikano and K. Lang, "Phoneme recognition using time-delay neural networks", IEEE Tram. Acoust. Speech Signal Process., Vol. 37, No. 3, 1989, pp. 328-339.

Signal Processing

intelligent systems for signal and image understanding

Documents