knowledge elicitation and semantic representation for the heterogeneous web

15
World Wide Web: Internet and Web Information Systems, 5, 229–243, 2002 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Knowledge Elicitation and Semantic Representation for the Heterogeneous Web H. LILIAN TANG [email protected] Department of Computing, University of Surrey, Guildford, Surrey, GU2 7XH, UK Abstract This paper presents methods and principles for knowledge elicitation and semantics definitions for images and text, respectively, and furthermore introduces a semantic representation scheme that fuses the semantic informa- tion extracted from image and text to facilitate intelligent indexing and retrieval for multimedia collection as well as media transformation through their semantic meanings. The method can be deployed for WWW applications such as telemedicine or virtual gallery. Keywords: knowledge elicitation, semantic representation, semantic labelling, content-based retrieval 1. Introduction The fundamental problem in Web-based intelligent multimedia information systems such as telemedicine is to locate and identify the semantic meanings in heterogeneous data. This poses challenges that traditional information extraction and representation methods could not overcome. This paper presents the methodology for knowledge elicitation, and in particular, introduces a preliminary semantic representation scheme that bridges the infor- mation gap not only between different media but also between different levels of contents in the media. This scheme will be further developed and used for semantic similarity mea- surement for browsing and retrieval of multimedia data. In this research, collections of images and full text data were tested. Semantics is closely related contextual information which is normally hard to locate and detect. In text, despite its informal nature, different methods have been explored, either using statistic method or traditional syntactic grammatical analysis, to extract and summarises the semantics in the text. Even if these methods are not perfectly accurate for locating semantic information, they have opened windows to approach text semantics. Compared with the textual semantics extraction, visual semantics poses even more chal- lenge due to the complexity of the data and yet the techniques of computational visual per- ception are far from mature. Traditionally textual information processing techniques like keyword were used for representing and manipulating visual data. However, apart from being subjective and inefficient for accessing visual data, keyword method was regarded to be inadequate to express visual similarity when performing retrieval. In recent years, more and more researchers have realised that content processing techniques are important. The request of access to visual information is not only performed at a conceptual level,

Upload: h-lilian-tang

Post on 02-Aug-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

World Wide Web: Internet and Web Information Systems, 5, 229–243, 2002 2002 Kluwer Academic Publishers. Manufactured in The Netherlands.

Knowledge Elicitation and Semantic Representationfor the Heterogeneous Web

H. LILIAN TANG [email protected] of Computing, University of Surrey, Guildford, Surrey, GU2 7XH, UK

Abstract

This paper presents methods and principles for knowledge elicitation and semantics definitions for images andtext, respectively, and furthermore introduces a semantic representation scheme that fuses the semantic informa-tion extracted from image and text to facilitate intelligent indexing and retrieval for multimedia collection as wellas media transformation through their semantic meanings. The method can be deployed for WWW applicationssuch as telemedicine or virtual gallery.

Keywords: knowledge elicitation, semantic representation, semantic labelling, content-based retrieval

1. Introduction

The fundamental problem in Web-based intelligent multimedia information systems suchas telemedicine is to locate and identify the semantic meanings in heterogeneous data. Thisposes challenges that traditional information extraction and representation methods couldnot overcome. This paper presents the methodology for knowledge elicitation, and inparticular, introduces a preliminary semantic representation scheme that bridges the infor-mation gap not only between different media but also between different levels of contentsin the media. This scheme will be further developed and used for semantic similarity mea-surement for browsing and retrieval of multimedia data. In this research, collections ofimages and full text data were tested.

Semantics is closely related contextual information which is normally hard to locateand detect. In text, despite its informal nature, different methods have been explored,either using statistic method or traditional syntactic grammatical analysis, to extract andsummarises the semantics in the text. Even if these methods are not perfectly accurate forlocating semantic information, they have opened windows to approach text semantics.

Compared with the textual semantics extraction, visual semantics poses even more chal-lenge due to the complexity of the data and yet the techniques of computational visual per-ception are far from mature. Traditionally textual information processing techniques likekeyword were used for representing and manipulating visual data. However, apart frombeing subjective and inefficient for accessing visual data, keyword method was regardedto be inadequate to express visual similarity when performing retrieval. In recent years,more and more researchers have realised that content processing techniques are important.The request of access to visual information is not only performed at a conceptual level,

Page 2: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

230 TANG

using keywords as in the textual domain, but also at a perceptual level, using objectivemeasurements of visual content [3]. In the past years, many researchers mainly focused onthe techniques at perceptual level, in another word, the low-level syntactic retrieval that isbased on primitive iconic features in the images, e.g., texture, colour, shapes, spatial rela-tionships. However, high-level semantic analysis is more desirable and this requires effortsto bridge the gap between low-level syntactic/iconic features that can be automaticallydetected by conventional image processing tools and high-level semantics that capturesthe meanings of images in different conceptual levels. Relevant work is shown in the I-Browse system [10], and Corridoni and Del Bimbo’s work [4]. Other related work can alsobe found in [1] and [14].

Even if it has been demonstrated that text and images are indeed subject to differentspecific cognitive processes [9], the textual and pictorial information is regarded as com-plementary not only by providing the context necessary to understand the other mediumand by providing the same information in different formats but also each medium can guidethe reader in the processing of the other medium [2,5,7]. Text and image are indeed closelyrelated. The rest of this paper will overview the method of extracting semantics from bothtexts and images, introduce the knowledge elicitation procedure (mainly focusing on im-age data in this paper), and finally give a discussion for representing the semantics schemefor both media.

2. Processes of semantics extraction

In this research we aim to build up a medical information system over the Web for telemedi-cine. The system contains patients’ information like examination images, doctors’ diag-nostic reports etc. The system supports query by image example and query by naturallanguage. It also allows textual annotations to be generated for unknown images. In orderto achieve these, techniques of image processing and natural language processing are de-ployed and integrated. Since the system has to handle various image data in a large scale,especially for the complicated histological images, the system has to rely on a suitable in-tegration of a variety of image processing and feature detection methods. Details of thesetechniques are presented in our publications [11,12]. This paper will focus on the seman-tic extraction from both text and images, and how they are fused through the intermediaterepresentation scheme.

2.1. Text semantic feature extraction

The text semantic analysis is based on static semantic representation at morphological,syntactic level, as well as the dynamic analysis through a text parser according to the con-textual collocations within words and phrases. The morphological properties of words orphrases mainly described in a large-scale comprehensive dictionary, and a concept spaceprovides hierarchical coding system that allocates information processing unit, like words,phrases, sentences etc. with one of the defined concepts. Concept space is also used in

Page 3: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

KNOWLEDGE ELICITATION AND SEMANTIC REPRESENTATION 231

Figure 1. From left to right respectively are colon glands in round, ovoid, and long shape.

image semantic analysis. The analysis procedure includes automatic segmentation, am-biguity processing, morphological analysis and syntactic analysis, semantic analysis andcomplex text context processing. The result is a semantic network Papillon, which will bediscussed in Section 4. The textual analysis system here aims to be an independent tool.It can be used for general text analysis purpose or any specific application by replacinga domain dictionary and changing the contents of the rules which are separated from themain control programme.

2.2. Semantics extraction and semantic definition for images

The key technique to approach visual semantics is to associate semantic meanings and vi-sual properties and to integrate syntactic visual analysis and semantic reasoning in connec-tion with the contextual knowledge. First we need to identify a set of meaningful semanticlabels for the particular domain and then train the appropriate set of feature detectors toidentify the semantic labels in images. Semantic analysis is then carried out according tothe context information.

2.2.1. Semantic label definition. In order to associate semantic meaning to differentimage appearance, the image is initially partitioned into a number of subimages. The sizeand the shape of subimages can be varied according their suitability to capture the imagefeatures. In this research, we use 64 × 64 squares. They form the basic units for imageanalysis such as texture and colour analysis techniques, as well as for semantic analysis.

In the semantic label set, all possible image visual features that may appear in the data-base of images have been defined and none of the important features have been left out.The visual feature detectors will be trained based on such exhaustive set of feature labels,avoiding as much as possible any new semantic features that may appear in an unknownimage later on that will confuse the visual feature detectors and affect the recognition ac-curacy. When defining the label set, it is necessary to decide whether to assign labels tosubimages according to their visual properties or to their domain meanings. There areseveral principles that can be summarised as a result of this research:

(a) Compromise between the depth of the semantic labels and the discrimination capabil-ity of visual detectors. The same meaningful semantic object may have very differentvisual appearance due to, for instance in this research, different angles or directions of2D images produced from 3D object. For example, colon glands could appear in dif-ferent shapes as shown in Figure 1. To distinguish such variation as well as lessen theconfusion to the visual detectors, we treat them as three different visual features with

Page 4: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

232 TANG

Figure 2. The first two are small intestine intestinal glands, the last two are colon glands, both features are crosscutting so that the glands look round.

different semantic labels while in the knowledge base it is depicted that they belong tothe same medical origin, i.e., the same histological semantic meaning.

(b) Compromise between similar visual appearance and distinct semantic labels. Differentsemantic object may have the same visual appearance because they have very similarvisual structures. For example, “intestinal glands in small intestine” and “large in-testine: mucosa” share very similar visual appearances (Figure 2). In this case wedescribe the features according to their domain origin and combine the features in thevisual feature detector when necessary for improving the clustering accuracy.

As a summary, the key rule is to locate the discriminative semantic meanings for thevisual features and at the same time create discriminative feasibilities for the visual featuredetectors and the semantic analysis. After defining the semantic label set, all the attributesfor the labels which are used to represent the visual features in the images are described inthe knowledge base.

2.2.2. Semantic interpretation of image content and the semantic label set. The inter-pretation of images relies on the image context and the content of the knowledge. Multiplelevels of semantic interpretation can be defined according to how the images are read. Inthis research, two levels of interpretations were defined, coarse feature level and fine fea-ture level. The features in each level have their respective knowledge held in differentknowledge bases, where such knowledge serves the purposes of reasoning, semantic de-scription, and generating text annotation. Any fine feature belongs to a coarse level feature.A visual feature can be mapped to different levels of meanings or terms. Here, each visualfeature is mapped to two levels of terms, fine and coarse levels.

The training data selection interface is shown in Figure 3. The user (at this stage it isnormally a domain expert) randomly selects an image, which is then automatically parti-tioned into subimages (surrounded by fine lines not shown in the figure). The user is onlyrequired to select some of the subimages corresponding to typical samples of different do-main characteristics, and then label these selected subimages with the predefined semanticlabels or give another new term if necessary. The system will then mark the region withan X, and the labels (at two semantic levels) and coordinates of the associated subimagelocations will be recorded to a training file as shown in Figure 4.

The user will repeat the above procedure until sufficient training samples have been col-lected that would cover most of the visual appearance in the database. The visual featuredetector will be given these training samples so as to identify other unknown images. Fora large-scale image database, particularly when the images themselves are of complicatednature, more feature detectors can be developed to identify the semantic labels associat-

Page 5: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

KNOWLEDGE ELICITATION AND SEMANTIC REPRESENTATION 233

Figure 3. A subsystem for collecting training sample for the association of visual appearance and their semanticlabels.

Figure 4. On the left of the arrow: training file; right: samples for the mapping between visual features and theirmeanings at different levels.

ing with different visual properties. To detect the same set of semantic labels, more thanone visual detectors can be used and produce results in parallel. The semantic analyserwhich incorporates domain knowledge will then carry out further analysis based on theprimitive results from the visual detectors, as well as the relative confidence of each detec-tor.

2.2.3. Training samples refinement. The following are some points that need to beconsidered when defining and optimising the label set.

(a) No overlapping among the definitions. This is to avoid cases when the appearance orelements in one feature appears in another feature.

(b) Sufficient. This is to avoid missing any rare but important features.

Page 6: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

234 TANG

(c) Even. The division of the features should try to be even in all coarse areas. It meanswhen categorising the features, it is avoided to divide one coarse area into more de-tailed fine features than another coarse area unless it is necessary.

(d) No redundancy.(e) Even coverage. The training samples should cover evenly from the image data which

come from different patients’ specimens.

A fixed subwindow size like 64 × 64 is not always appropriate for every feature. Aninappropriate choice of the processing window size may affect the accuracy of the detec-tors. To improve the performance, the system allows the visual detectors to use a largeror a smaller window, e.g., 128 × 128 and 32 × 32, while executing the appropriate visualdetectors dynamically.

2.3. Content of the knowledge base

With separate domain knowledge provided in the architecture, application-independenceas well as media-independence can be realised. Knowledge base includes:

(a) domain attributes, such as its logical or expected location, the region it should belongto, etc.;

(b) visual attributes, such as colour, shape, size, quantity, as well as the similarity withany other features, and the various relationships including spatial relationship amongthem;

(c) measurement attributes, indicating which the detector is the best one that suits thesefeatures;

(d) contextual attributes, e.g., the special attribute it may have when the semantic label iscombined with some situations or other semantic labels.

Examples for part of two levels of knowledge content are given in Figure 5(a) and (b).Among these attributes in Figure 5(a), parents describes which organ this feature belongsto. Texture level tells how much this feature can relies on texture measurement. If the valueis “low,” that means this feature in the analysed image probably needs a fine detector toconfirm it again. Coarse region indicates that this feature belongs to “X” which is one ofthe coarse features defined in the system. Similar is the logical similarity of this featurewith others from the visual observation, and closed class describes how other featuresare similar to this feature according to the nature of the detectors. Detector tells whichdetectors may be used for specific fine detection and in this field the name of the detector isgiven. Quantity describes how many feature elements in one subwindow. In this example,it is 5 to 10 as an estimated number. Size tells how much a single element of the featureroughly is if compared with the subwindow.

The knowledge system also contains information generated from a confusion matrix.Confusion matrices are computed from the test data and past performance of the detector.It tells the relative accuracy of the classified samples, and also which features are closelymeasured thus indicating the similarity of the features in the detectors. Such informationwill be useful for the semantic analyser.

Page 7: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

KNOWLEDGE ELICITATION AND SEMANTIC REPRESENTATION 235

no : 1name : a_tissuenatural name : adipose tissueclosed class : lumen, junction of lumen and foveolae, fine

muscularis mucosae, connective tissueparents : commontexture level: lowcoarse region: Xsimilar : lumen, connective tissueneighbour : a_tissue, blood-vesseldetector : adipose tissuecolor : white, light pinkquantity : 5-10shape : freesize : < 1 subwindow

(a)

no : 5name : MuscularisExternaAbbreviation : Eclosed class : M+Ein-neighbour : Submucosa, J-SE, S+A, M+Eout-neighbour: Serosa, J-EO, S+A, M+Ecolor : pinkishquantity :shape : freesize :

(b)

Figure 5. (a) Knowledge frame and sample values of attributes for fine level. (b) Knowledge frame and samplevalues of attributes for coarse level.

3. Papillon – intermediate semantic representation

In the research we developed a preliminary semantic representation scheme, Papillon, torepresent the semantic content for both full text and image. The content of Papillon comesfrom either full text analysis or image analysis. It can be used as an intermediate semanticrepresentation for the two media so that in a database, any query can be made by any media,and through it, the retrieved information can be represented using any media. In addition,textual annotation for unknown image can be automatically generated through Papillon.

3.1. Technical design of Papillon

Ever since the early time of work on automatic semantic analysis, there has been a stronginclination to use graphical representations of semantic structures [6]. Papillon consists

Page 8: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

236 TANG

of a semantic network, which is a graph, where the nodes in the graph represent con-cepts, and the arcs represent binary relationships between concepts [8]. The nodes orentities are objects or features and their descriptions in the image or text, which containsall the information and attributes about the entities including semantic code. The entityis represented in a complex structure, and the arcs are the various semantic relationshipsbetween nodes. The semantic relationships express the inherent semantic relationshipsbetween concepts that have been derived from a particular media. Here in this applica-tion, the media could be an image object or language words or phrases that in fact expresscertain objects or concepts. The following are some of the relationships used in Papil-lon.

• Agent: the object which takes an action and affects other objects. Example: “The ham-mer broke the table.” We can regard this either as a text description or an image content.Relationship graph is like

In this example, the hammer is the agent of the action “break,” and “table” is the objectof break. Here object is another type of relationship. If this sentence is a semanticmeaning in a picture which is about a hammer is destroying a table, the hammer and thetable are objects that are identified in the image, while the concept “break” or “hitting”are generated by the semantic analysis according to the spatial relationship between thehammer and the table, as well as the situation of the table, for example, broken piecesetc.

• Analogue: this relationship is to indicate similarity. Example: “I am like a butterfly.”Relationship graph is like

I −−−−−→analogue

butterfly.

It is easier to obtain such relationship from textual information through identifying key-words such as “like,” “as if” etc. This relationship is more abstract when applied toimage understanding. However, such concepts are useful in image understanding whentwo objects are visually similar, or when one object is mistakenly identified as anotherobject that may not appear in the image.

• Compose of : this relationship indicates the relationship between elements and their su-per class. Example: “Mucosa contains many glands.” Relationship graph is like

Mucosa −−−−−−→compose of

glands.

• Degree: to indicate the degree of deformation or differences of an object from a knownobject.

Page 9: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

KNOWLEDGE ELICITATION AND SEMANTIC REPRESENTATION 237

Figure 6. An example for a visualised Papillon.

• Direction: to indicate the direction of an action or a moving object. Example: “Walk tothe river.” Relationship graph is like

Walk −−−−−→direction

river.

The rest of the semantic relationships are: duration, focus, frequency, goal, instrument,location, manner, modification, object, origin, parallel, possession, quantity, colour, size,shape, produce, reason, equal, reference, result, scope, attribute, time, value, capacity,condition, comparison, consequence.

Using the semantic relationships solely cannot represent all the semantics that may befound in an image or in a piece of text. In fact, the entities/nodes also carry certain degree ofsemantics through the static semantic attribute representation of the entities, among whicha systematic semantic code in the concept space is also given to each entity if applicable.

3.2. Samples of Papillon

The content of Papillon can be generated from both text semantic analysis and image se-mantic analysis. Figure 6 presents a visual example of a Papillon for the following text:“Viewing from the right to the left of the image, there is the lumen (greyish area), themucosa which consists of numerous small roundish glandular elements (bluish-red layer),

Page 10: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

238 TANG

view (0, v, progressing, direction/axis):S: from the right to the left (50, n, direction, object)

S: image (9, n, the/of, modify)lumen (14, n):

S: area (17, n, location)S: greyish (16, a, colour)

mucosa (21, n, nn5, , , , , the):S: glandular element (58, n, compose-of)S: roundish (27, a, aa1, shape)

S: small (26, a, size)S: layer (32, n, location)

S: bluish-red(31, a, colour)S: numerous (25, a, quantity)

submucosa (37, n, the):S: loose connective tissue (62, n, compose-of)

S: blood vessel (63, n, with, concurrence)S: many (46, m, quantity)

Figure 7. Example of internal representation of a Papillon.

and the submucosa which is composed of loose connective tissue with many blood ves-sels.”

In Figure 6, the rectangular entities are word or phrase unit and they are also the objectfeatures that may be possibly correspondent to the semantic labels defined in the system,while the circle entities are related attributes. The above semantic structure is representedinternally as master–servant pairs which is equivalent to the above graph but made simplerfor either index match or database storage (see Figure 7 which is the result produced fromtext semantic analysis programme). One master could have more then one servants.

In Figure 7, “n,” “a,” etc. are the categories of entities. The relationship between themaster and servants are put into one of the attributes of the correspondent servants. Forexample, the master “glandular element” has four servants, respectively the relationshipsare “shape,” size,” “location,” “quantity.” A master can be a servant of another master, inthe way the tree structure is formed. In this example, “glandular element” is the servantof “mucosa” with the relationship of “compose-of,” and the location servant of “glandularelement” “layer” has a colour servant “bluish-red.”

A Papillon can be a forest of several trees. In this example the internal representation is aforest with “view,” “lumen,” “mucosa” and “submucosa” as roots. When linking these fourroots to certain relationships, this forest is equivalently transformed to a big tree. However,it is not necessary to make an extra effort to represent content into a complicated graph ifthe current forest representation is sufficient enough. Forest has its own advantage, whichhas benefited for similarity measurement for the database.

Figure 9 shows the processing interface for the image shown in Figure 8, and theanalysed Papillon result is shown in Figure 10. From the information shown in Fig-ure 10, it clearly indicates there are several major features in the image correspondingto the coarse level feature like “Lumen,” “Mucosa,” “Submucosa.” It also tells what com-pose these coarse features and their relationships. For example, in Mucosa area, there aremus-f_m_mucosae (No. 186), which has two servants that describe its colour and spatial

Page 11: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

KNOWLEDGE ELICITATION AND SEMANTIC REPRESENTATION 239

Figure 8. Sample analysed image.

Figure 9. Interface showing analysing procedure.

Page 12: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

240 TANG

Starting: B97-00661-01.2.s2.x5.4.jpgCambridge size = 34oesophagus (294, n, where):from lower left (292, n, axis):Lumen (155, n, ):

S: greyish area (157, n, color)S: lower left (158, n, spatial)

mus-f_m_mucosae (159, n, ):S: pinkish layer (161, n, color)S: lower right (162, n, spatial)

oe-j-epithelium.l_propria (164, n, ):S: lower right (166, n, spatial)

l_nodule (168, n, ):S: bluish-red layer (170, n, color)S: many (171, n, quantity)S: lower left (172, n, spatial)

oe-j-epithelium.l_propria (174, n, ):S: center middle (176, n, spatial)

oe-j-epithelium.l_propria (178, n, ):S: center left (180, n, spatial)

oe-j-epithelium.l_propria (182, n, ):S: center left (184, n, spatial)

mus-f_m_mucosae (186, n, ):S: pinkish layer (188, n, color)S: center left (189, n, spatial)

l_nodule (191, n, ):S: bluish-red layer (193, n, color)S: many (194, n, quantity)S: center left (195, n, spatial)

l_propria (197, n, ):S: upper right (199, n, spatial)

st-foveolae_surface (201, n, ):S: upper right (203, n, spatial)

oe-j-epithelium.l_propria (205, n, ):S: upper left (207, n, spatial)

l_nodule (209, n, ):S: bluish-red layer (211, n, color)S: many (212, n, quantity)S: upper left (213, n, spatial)

oe-epithelium (215, n, ):S: center middle (217, n, spatial)

Figure 10. A part of the Papillon generated in the system for the image shown in Figure 8.

attribute, and mus-f_m_mucosae (No. 186) itself is also a servant of Mucosa. EntitiesNo. 159, 164, 168, 174, 178, 182, 186, 191, 197, 201, 205, 209, 215 are all servants ofMucosa (No. 219). The further information about the servants is described using anotherforest root when they have their own servants. All other entities can be read in similarway. In summary, Papillon generated from an image contains the semantic content of theimage, the prevailing features and all the rest of the features with respective attributes and

Page 13: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

KNOWLEDGE ELICITATION AND SEMANTIC REPRESENTATION 241

Mucosa (219, n, ):S: prevalent (221, n, size)S: center middle (222, n, spatial)S: mus-f_m_mucosae (159, n, is composed of)S: oe-j-epithelium.l_propria (164, n, is composed of)S: l_nodule (168, n, is composed of)S: oe-j-epithelium.l_propria (174, n, is composed of)S: oe-j-epithelium.l_propria (178, n, is composed of)S: oe-j-epithelium.l_propria (182, n, is composed of)S: mus-f_m_mucosae (186, n, is composed of)S: l_nodule (191, n, is composed of)S: l_propria (197, n, is composed of)S: st-foveolae_surface (201, n, is composed of)S: oe-j-epithelium.l_propria (205, n, is composed of)S: l_nodule (209, n, is composed of)S: oe-epithelium (215, n, is composed of)

mus-l_smooth_in_mass (223, n, ):S: pinkish layer (225, n, color)S: upper left (226, n, spatial)

MuscularisExterna (228, n, ):S: pinkish layer (230, n, color)S: upper left (231, n, spatial)S: mus-l_smooth_in_mass (223, n, is composed of)

ap-j-mexterna.serosa (232, n, ):S: upper middle (234, n, spatial)

Serosa (236, n, ):S: upper middle (238, n, spatial)S: ap-j-mexterna.serosa (232, n, is composed of)

b_vessel_empty (239, n, ):S: some (241, n, quantity)S: upper right (242, n, spatial)

js-smucosa.f_m_mucosae (244, n, ):S: upper right (246, n, spatial)

js-smucosa.t_mus (248, n, ):S: upper middle (250, n, spatial)

b_vessel_not_empty (252, n, ):S: some (254, n, quantity)S: upper middle (255, n, spatial)

Figure 10. (Continued.)

the semantic relationships. In Figure 9, two windows with partitioned subimages show theprocessing procedure of semantic analysis, and the window on the low-left display the gen-erated textual annotation for the image based on the information contained in the Papillon.

4. Conclusion

The method of semantic analysis for image and text in this research sheds light on intel-ligent indexing and retrieval of multimedia information based on semantic content. Manypotential applications can be explored once the different levels of semantic content have

Page 14: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

242 TANG

js-smucosa.f_m_mucosae (257, n, ):S: upper middle (259, n, spatial)

c_tissue_l (261, n, ):S: upper middle (263, n, spatial)

c_tissue (265, n, ):S: upper right (267, n, spatial)

c_tissue_l (269, n, ):S: upper middle (271, n, spatial)

c_tissue (273, n, ):S: upper middle (275, n, spatial)

l_vessel (277, n, ):S: upper middle (279, n, spatial)

c_tissue (281, n, ):S: upper middle (283, n, spatial)

oe-glands (285, n, ):S: upper left (287, n, spatial)

Submucosa (289, n, ):S: upper middle (291, n, spatial)S: b_vessel_empty (239, n, is composed of)S: js-smucosa.f_m_mucosae (244, n, is composed of)S: js-smucosa.t_mus (248, n, is composed of)S: b_vessel_not_empty (252, n, is composed of)S: js-smucosa.f_m_mucosae (257, n, is composed of)S: c_tissue_l (261, n, is composed of)S: c_tissue (265, n, is composed of)S: c_tissue_l (269, n, is composed of)S: c_tissue (273, n, is composed of)S: l_vessel (277, n, is composed of)S: c_tissue (281, n, is composed of)S: oe-glands (285, n, is composed of)

Figure 10. (Continued.)

been defined and extracted using the methodology presented here. In this research, inparticular, we have implemented an automatic textual annotation generator for unknownimages, demonstrating the semantic interpretation capability of the system. It also provesthat the transformation between different media through Papillon based on their meaningsis possible. The research prototype of the techniques has shown the values of intelligentcapabilities and the potential benefits that it may bring to the medical community andother domain such as art, scientific archiving, as well as the multimedia World Wide Web(WWW) documents.

Acknowledgement

This work is based on the author’s PhD thesis carried out at the University of Cambridge,UK. The author would like to thank her supervisor Dr. R. Hanka, and the medical collabo-rators: Dr. K. C. Lee, Consultant Pathologist, Department of Pathology, Princess MargaretHospital, Dr. Ewen Sims, Consultant Pathologist, Royal Bolton Hospital, Bolton, UK, andDr. Jeremy Rashbass, Consultant Pathologist, Addenbrooke’s Hospital Cambridge, UK.

Page 15: Knowledge Elicitation and Semantic Representation for the Heterogeneous Web

KNOWLEDGE ELICITATION AND SEMANTIC REPRESENTATION 243

References

[1] P. B. Berra and A. Ghafoor, “Data and knowledge management in multimedia systems,” IEEE Transactionson Knowledge and Data Engineering 10(6), November–December 1998, 686–671.

[2] M. Betrancourt and A. Bisseret, “Integrating textual and pictorial information via pop-up windows: Anexperimental study,” Behaviour and Information Technology 17(5), 1998, 263–273.

[3] C. Colombo, A. Del Bimbo, and P. Pala, “Semantics in visual information retrieval,” IEEE Multimedia,July–September 1999, 38–53.

[4] J. M. Corridoni, A. Del Bimbo, and E. Vicario, “Image retrieval by color semantics with incomplete knowl-edge,” Journal of the American Society for Information Science 49(3), March 1998, 267–282.

[5] A. M. Glenberg and P. Kruley, “Pictures and anaphora: Evidence for independent processes,” Memory andCognition 20(5), 1992, 461–471.

[6] R. Grishman, Computational Linguistics, Cambridge University Press, 1986.[7] M. Hegarty and M. A. Just, “Constructing mental models of machines from text and diagrams,” Journal of

Memory and Language 32, 1993, 717–742.[8] http://www.cee.hw.ac.uk/∼alison/ai3notes/subsection2_4_2_1.html[9] J. L. Santa, “Spatial transformation of words and pictures,” Journal of Experimental Psychology: Human

Learning and Memory 3, 1997, 418–427.[10] L. H. Tang, “Semantic analysis of image content for intelligent retrieval and automatic annotation of medical

images,” PhD Dissertation, University of Cambridge, England, 2000.[11] L. H. Y. Tang, R. Hanka, H. H. S. Ip, K. K. T. Cheung, and R. Lam, “Integration of intelligent engines for

a large scale medical image database”, in Proceedings of IEEE Conference on Computer Based MedicalSystems, CBMS 2000, Texas Medical Center, Houston, TX, June 23–24, 2000.

[12] L. H. Y. Tang, H. H. S. Ip, R. Hanka, K. K. T. Cheung, and R. Lam, “Semantic query processing andannotation generation for content-based retrieval of histological images,” in Proceedings of SPIE MedicalImaging, San Diego, CA, 20–26 February 2000 (Cum Laude Award).

[13] H. Y. Tang and T. S. Yao, “The lexical semantic driving algorithm based on collocation dictionary,” Journalof Software 6, Supplement, 1995, 78–85.

[14] A. Vailaya, A. Jain, and H. J. Zhang, “On image classification: City images vs landscapes,” Pattern Recog-nition 31(12), 1998, 1921–1935.