granular knowledge representation and inference using labels and label expressions

500 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 18, NO. 3, JUNE 2010

Granular Knowledge Representation and InferenceUsing Labels and Label Expressions

Jonathan Lawry and Yongchuan Tang

Abstract—This paper is a review of the label semantics frame-work as an epistemic approach to modeling granular informationrepresented by linguistic labels and label expressions. The focus oflabel semantics is on the decision-making process that a rationalcommunicating agent must undertake in order to establish whichavailable labels can be appropriately used to describe their per-ceptual information in such a way as they are consistent with thelinguistic conventions of the population. As such, it provides anapproach to characterizing the relationship between labels and theunderlying perceptual domain which, we propose, lies at the heartof what is meant by information granules. Furthermore, it is thenshown that there is an intuitive relationship between label seman-tics and prototype theory, which provides a clear link with Zadeh’soriginal conception of information granularity. For informationpropagation, linguistic mappings are introduced, which provide amechanism to infer labeling information about a decision variablefrom the available labeling information about a set of input vari-ables. Finally, a decision-making process is outlined whereby fromlinguistic descriptions of input variables, we can infer a linguis-tic description of the decision variable and, where required, selecta single expression describing that variable or a single estimatedvalue.

Index Terms—Appropriateness measure, label semantics, lin-guistic mapping, mass function.

I. INTRODUCTION

THE ABILITY to effectively describe the continuous do-main of sensory perception in terms of a finite set of de-

scription labels is fundamental to human communication. It isthis process of granular modeling, which permits us to pro-cess and transmit information efficiently at a suitable level ofdetail, to express similarity and difference between perceptualexperiences, and to generalize from current knowledge to newsituations. Furthermore, it allows us to express information andknowledge in a way that is robust to small variations, noise,and sensory aggregations in a complex multidimensional andevolving perceptual environment. Given these advantages, the

Manuscript received April 16, 2009; revised January 29, 2010; acceptedMarch 29, 2010. Date of publication April 15, 2010; date of current versionMay 25, 2010. The work of Y. Tang was supported in part by the NationalNatural Science Foundation of China (NSFC) under Grant 60604034, in partby the joint funding of NSFC and MSRA under Grant 60776798, and in partby the Science and Technology Program of Zhejiang Province under Grant2007C23061.

J. Lawry is with the Department of Engineering Mathematics, University ofBristol, Bristol, BS8 1TR, U.K. (e-mail: [email protected]).

Y. Tang is with the College of Computer Science, Zhejiang University,Hangzhou 310027, China (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TFUZZ.2010.2048218

formalization of granular models within a mathematical theorycan allow the effective modeling of complex multidimensionalsystems in such a way as to be understandable to practitionerswho are not necessarily experts in formal mathematics.

The use of labels as a means to discretize information plays acentral role in granular modeling. Indeed, one possible definitionof an information granule could be in terms of the mappingbetween labels and domain values as follows:

An information granule is a characterization of the relationshipbetween a discrete label or expression and elements of the underlying(often continuous) perceptual domain which it describes.

From this perspective, crisp sets, fuzzy sets [33], randomsets [19], and rough sets [20] can all correspond to informa-tion granules in the way that they can be used to characterizejust such a relationship between a label and the elements ofthe underlying domain. A typical form of information granuleis as the extension of the concept symbolically represented bya given label. For a label L, the extension of L identifies theset of domain elements to which L can be truthfully or appro-priately applied. Fuzzy sets, random sets, and rough sets arethen mechanisms according to which gradedness, uncertainty,and imprecision can be introduced into the definition of conceptextensions, respectively.

A related definition of information granule was originallyproposed by Zadeh [34], who explains granularity in terms of(possibly fuzzy) clusters of points as follows:

A granule is a clump of objects (points) which are drawn togetherby indistinguishability, similarity, proximity and functionality.

The aforementioned two views of information granule canbe related through the notion of conceptual space proposed byGardenfors [5]. Conceptual spaces are metric spaces of sensoryinputs in which the extensions of concepts correspond to convexregions. Certainly, from this perspective, elements within theextension of a concept are indeed likely to be linked in termsof their similarity and proximity to one another. In addition, thefunctionality of an object can directly influence the way that itis labeled or classified. For example, the labeled parts of theface, such as nose, mouth, ear, etc., as noted by Zadeh [34], aresignificantly dependent on their respective functions.

Central to Zadeh’s [35] formalism for information granulesare fuzzy sets of similar elements often generated by clusteringalgorithms, such as fuzzy c-means (e.g., the collaborative infor-mation granules proposed in [22]). In this context, membershipin granules is gradual and often proportional to similarity toa prototypical case. From the latter perspective, we shall seea clear link to the label-semantics framework, which we willoutline shortly.

1063-6706/$26.00 © 2010 IEEE

LAWRY AND TANG: GRANULAR KNOWLEDGE REPRESENTATION AND INFERENCE USING LABELS AND LABEL EXPRESSIONS 501

In contrast, Pawlak [21] defines information granulesas equivalent classes of indistinguishable elements from adatabase. In rough set theory, an equivalence relation is de-fined by a set of finite-valued attributes so that two elementsare deemed to be equivalent (or indistinguishable), if they haveidentical attribute values. However, according to Pawlak, gran-ules are defined by the limitations of the feature informationthat, in practice, we are able to record about elements of theunderlying universe. Based on these equivalence classes, anysubset of data elements S is then represented by a lower andupper set of granules (a rough set), where the former consists ofthose equivalence classes, which are subsets of S, and the latterof those classes, which overlap with S. In consequence, we shalldemonstrate a clear link between the proposed framework andPawlak’s theory in the form of random rough sets.

Bargiela and Pedrycz [2] consider the problem of interpretinggranules as sets, when granules may themselves include infor-mation granules as members. This is clearly a very natural idea,since many labels describe objects, which are themselves com-pound. For example, the label forest groups together particularforests into an information granule, while each forest is itself aparticular group of different trees. This possibility then leavesthe theory of granular information vulnerable to Russell’s para-dox. To avoid this problem, Bargiela and Pedrycz [2] suggestthat granules can be interpreted as classes within Von Neumann–Bernays–Godel axiomatization of set theory. In the frameworkproposed in this paper, we have only considered informationgranules corresponding to sets of basic elements from an under-lying universe (i.e., sets at the lowest level in Russell’s theoryof types). However, future work on extending the approach toinclude cases where the referents for the labels are also infor-mation granules, may require its integration within Bargiela andPedrycz’s abstract framework.

Label semantics [9], [10] is a representation framework to en-code the conventions for the allocation of labels and compoundexpressions generated from labels, as descriptions of elementsfrom the underlying domain. As such, it provides a useful toolfor granular modeling when formulated as above with an em-phasis on the association of perceptual information and labels.The notion of vagueness is also closely related to that of in-formation granularity in that for most examples of informationprocessing in natural language, the information granules are notprecisely defined. Indeed, this semantic imprecision can oftenresult in more flexible and robust granular models. Label seman-tics is based on an epistemic theory of vagueness [32], accordingto which the individual agents involved in communication be-lieve in the existence of language conventions shared across thepopulation of communicators but are (typically) uncertain asto which of the available labels can be appropriately used todescribe any given instance.

An outline of the paper is as follows. Section II describes theunderlying decision-making philosophy of label semantics interms of the epistemic stance. Section III introduces the basicconcepts and definitions in label semantics. Section IV linksthis framework with the prototype theory, Zadeh’s original con-ception of information granularity, as well as with Pawlak’srough set theory. Section V shows how labeling information

can be inferred given imprecise or uncertain domain knowledge.Section VI introduces defuzzification methods, whereby singlevalues can be estimated from general linguistic information. In asimilar way, Section VII describes the selection of a single-labelexpression on the basis of more general linguistic information.Linguistic models are introduced in Section VIII as a meansto propagate labeling information in decision problems, withthe whole decision process being summarized in Section IX.Finally, conclusions are given in Section X.

II. UNDERLYING PHILOSOPHY: EPISTEMIC STANCE

In label semantics, the main focus is on the decision-makingprocess an intelligent agent must go through in order to identifywhich labels or expressions can actually be used to describe anobject or value. In other words, in order to make an assertion todescribe an object in terms of some set of linguistic labels, anagent must first identify which of these labels are appropriate orassertible in this context. Given the way that individuals learnlanguage through an ongoing process of interaction with othercommunicating agents and with the environment, we can expectfor there to be considerable uncertainty associated with anydecisions of this kind. Furthermore, there is a subtle assumptioncentral to the label semantic model that such decisions regardingappropriateness or assertibility are meaningful. For instance, thefuzzy logic view is that vague descriptions like “John is tall”are generally only partially true, and hence, it is not meaningfulto consider which of a set of given labels can truthfully be usedto describe John’s height. However, we contest that the efficacyof natural language as a means to convey information betweenmembers of a population lies in shared conventions to governthe appropriate use of words, which are at least loosely adheredto by individuals within the population.

In our everyday use of language, we are continually faced withdecisions about the best way to describe objects and instancesin order to convey the information we intend. For example, letus suppose you are a witness to a robbery. How should youdescribe the robber so that police on patrol in the streets willhave the best chance to spot him? You will have certain labelsthat can be applied, for example, tall, short, medium, fat, thin,blonde, etc., some of which you may view as inappropriate forthe robber and others perhaps you think are definitely appro-priate, while for some labels, you are uncertain whether theyare appropriate or not. On the other hand, perhaps you havesome ordered preferences between labels so that tall is moreappropriate than medium, which is, in turn, more appropriatethan short. Your choice of words to describe the robber shouldsurely then be based on these judgments about the appropri-ateness of labels. However, where does this knowledge comefrom, and, more fundamentally, what does it actually mean tosay that a label is or is not appropriate? Label semantics pro-poses an interpretation of vague description labels based on aparticular notion of appropriateness and suggests a measure ofsubjective uncertainty that results from an agent’s partial knowl-edge about what labels are appropriate to assert. Furthermore,it is suggested that the vagueness of these description labels liesfundamentally in the uncertainty about if and when they are


appropriate as governed by the rules and conventions of lan-guage use. The underlying assumption here is that some thingscan be correctly asserted, while others cannot. Exactly, wherethe dividing line lies between those labels that are and those thatare not appropriate to use may be uncertain, but the assumptionthat such a division exists would be a natural precursor to anydecision-making process of the kind just described.

The aforementioned argument is very close to the epis-temic view of vagueness as expounded by Williamson [32].Williamson assumes that for the extension of a vague concept,there is a precise, but unknown, dividing boundary between itand the extension of the negation of this concept. However,while there are marked similarities between the epistemic the-ory and the label semantics view, there are also some subtledifferences. For instance, the epistemic view would seem to as-sume the existence of some objectively correct, but unknown,definition of a vague concept. Instead, we argue that individualswhen faced with decision problems regarding assertions find ituseful as part of a decision-making strategy to assume that thereis a clear dividing line between those labels which are and thosewhich are not appropriate to describe a given instance. We referto this strategic assumption across a population of communicat-ing agents as the epistemic stance [11], a concise statement ofwhich is as follows:

Each individual agent in the population assumes the existence of aset of labeling conventions, valid across the whole population, gov-erning what linguistic labels and expressions can be appropriatelyused to describe particular instances.

In practice, these rules and conventions underlying the ap-propriate use of labels would not be imposed by some outsideauthority. In fact, they may not exist at all in a formal sense.Rather, they are represented as a distributed body of knowledgethat concern the assertability of predicates in various cases,shared across a population of agents, and emerging as a resultof interactions and communications between individual agents,all adopting the epistemic stance. The idea is that the learningprocesses of individual agents that all share the fundamental aimof understanding how words can be appropriately used to com-municate information will eventually converge to some degreeon a set of shared conventions. The very process of convergencethen to some extent vindicates the epistemic stance from the per-spective of individual agents. Of course, this is not to suggestcomplete or even extensive agreement between individuals asto these appropriateness conventions. However, the overlap be-tween agents should be sufficient to ensure the effective transferof useful information.

III. LABEL SEMANTICS

Label semantics proposes two fundamental and interrelatedmeasures of the appropriateness of labels as descriptions of anobject or value. We begin with the assumption that for all agents,there is a fixed shared vocabulary in the form of a finite set ofbasic labels LA to describe elements from the underlying uni-verse Ω. These are building blocks for more complex compoundexpressions, which can then also be used as descriptors as fol-lows. A countably infinite set of expressions LE can be generated

through recursive applications of logical connectives to the basiclabels in LA. Therefore, for example, if Ω is the set of all possi-ble rgb values, and LA is the set of basic color labels, such as red,yellow, green, orange, etc., then LE contains those compoundexpressions, such as red and yellow, neither blue nor orange,etc. The measure of appropriateness of an expression θ ∈ LE asa description of instance x is denoted by µθ (x) and quantifiesthe agent’s subjective probability that θ can be appropriatelyused to describe x. From an alternative perspective, when facedwith describing instance x, an agent may consider each label inLA and attempt to identify the subset of labels that are appropri-ate to use. This is a totally meaningful endeavor for agents whoadopt the epistemic stance. Let this complete set of appropriatelabels for x be denoted by Dx . In the face of their uncertainty re-garding labeling conventions, agents will also be uncertain as tothe composition of Dx , and we represent this uncertainty with aprobability mass function mx : 2LA → [0, 1] defined on subsetsof labels. Hence, the subset of labels red, orange, yellowand rgb value x, mx(red, orange, yellow) denotes the sub-jective probability thatDx = red, orange, yellow or, in otherwords, that red, orange, yellow is the complete set of basiccolor labels with which it is appropriate to describe x. We nowprovide formal definitions for the set of expressions LE andfor mass functions mx , following which, we will propose a linkbetween the two measures µθ (x) and mx for expression θ ∈ LE.

Definition 1 (Label expressions):The set of label expressions LE generated from LA is defined

recursively as follows:1) If L ∈ LA, then L ∈ LE.2) If θ, ϕ ∈ LE, then ¬θ, θ ∧ ϕ, θ ∨ ϕ ∈ LE.Definition 2 (Mass function on labels): ∀x ∈ Ω, a mass

function on labels is a function mx : 2LA → [0, 1] such that∑S⊆LA mx (S) = 1Note that there is no requirement for the mass associated with

the empty set to be zero. Instead, mx(∅) quantifies the agent’sbelief that none of the labels are appropriate to describe x. Wemight observe that this phenomena occurs frequently in nat-ural language, especially when labeling perceptions generatedalong some continuum. For example, we occasionally encountercolors for which none of our available color descriptors seemappropriate. Hence, the value mx(∅) is an indicator of the de-scribability of x in terms of the labels LA.

Now, depending on the labeling conventions, there may becertain combinations of labels, which cannot be appropriate todescribe any object. For example, small and large cannot bothbe appropriate. This restricts the possible values of Dx to thefollowing set of focal elements.

Definition 3 (Set of focal elements): Given labels LAtogether with associated mass assignment mx : ∀x ∈ Ω,the set of focal elements for LA is given by F =S ⊆ LA : ∃x ∈ Ω, mx (S) > 0.

Definition 4 (Set of mass functions): Let M denote the set ofmass functions with focal sets contained in F , i.e., M = m :2LA → [0, 1] :

∑F ∈F m(F ) = 1.

The link between the mass function mx and the appropri-ateness measures µθ (x) is motivated by the intuition that theassertion “x is θ” directly provides information dependent on


θ as to what are the possible values for Dx . For example, theassertion “x is blue” would mean that blue is an appropriatelabel for x from which we can infer that blue ∈ Dx . Similarly,the assertion “x is green and not blue” would mean that greenis an appropriate label for x, while blue is not, so that we caninfer green ∈ Dx and blue ∈ Dx . Another way to express thisinformation is to say that Dx must be a member of the setof sets of labels which contain green but do not contain blue,i.e., Dx ∈ S ⊆ LA : green ∈ S, blue ∈ S. More generally,we can define a functional mapping λ from LE into 22L A

(i.e.,the set which contain all possible sets of label sets) for whichthe assertion “x is θ” enables us to infer that Dx ∈ λ(θ). Thismapping is defined recursively as follows.

Definition 5 (λ-mapping):λ : LE → 2F is defined recursively as follows: ∀θ, ϕ ∈ LE1) ∀L ∈ LA λ(L) = S ∈ F : L ∈ S;2) λ(θ ∧ ϕ) = λ(θ) ∩ λ(ϕ);3) λ(θ ∨ ϕ) = λ(θ) ∪ λ(ϕ);4) λ(¬θ) = λ(θ)c .The λ-mapping then provides us a means to evaluate the

appropriateness measure of an expression θ directly from mx ,as corresponding to the subjective probability that Dx ∈ λ(θ),as follows:

Definition 6 (Appropriateness measures): For any expressionθ ∈ LE and x ∈ Ω, the appropriateness measure µθ (x) can bedetermined from the mass function mx according to

∀θ ∈ LE µθ (x) =∑

S∈λ(θ)

mx(S).

From this relationship, the following list of general propertieshold for expressions θ and ϕ in LE [9].

Theorem 7 (Lawry [9], [10]):1) If θ |= ϕ, then ∀x ∈ Ω µθ (x) ≤ µϕ (x).2) If θ ≡ ϕ, then ∀x ∈ Ω µθ (x) = µϕ (x).3) If θ is a tautology, then ∀x ∈ Ω µθ (x) = 1.4) If θ is a contradiction, then ∀x ∈ Ω µθ (x) = 0.5) ∀x ∈ Ω µ¬θ (x) = 1 − µθ (x).Note that the laws of excluded middle and noncontradic-

tion are preserved, since for any expression θ, λ(θ ∨ ¬θ) =λ(θ) ∪ λ(θ)c = 22L A

, and λ(θ ∧ ¬θ) = λ(θ) ∩ λ(θ)c = ∅. Inaddition, the idempotent condition holds, since λ(θ ∧ θ) =λ(θ) ∩ λ(θ) = λ(θ).

The λ-mapping provides us with a clear formal representationfor linguistic constraints, where the imprecise constraint “x isθ” on x is interpreted as the precise constraint Dx ∈ λ(θ) onDx .

A. Ordering Labels

As discussed earlier, an agent’s estimation of both mx andµθ (x) should depend on their experience of language use, in-volving examples similar to x. Clearly, the form of this knowl-edge is likely to be both varied and complex. However, onenatural type of assessment for an agent to make would be toorder or rank labels in terms of their estimated appropriate-ness for x. This order information could then be combined with

estimates of appropriateness measure values for the basic labels(i.e., elements of LA) in order to provide estimates of values forcompound expressions (i.e., elements of LE). Hence, we assumethat

An agent’s knowledge of label appropriateness for an instance x,can be represented by an ordering on the basic labels LA and anallocation of uncertainty values to the labels consistent with thisordering.

Effectively, we assume that through a process of extrapolationfrom experience agents are, for a given instance, able to (atleast partially) rank labels, in terms of their appropriateness,and then, consistent with this ranking, to estimate a subjectiveprobability that each label is appropriate. On the basis of both theordering and probability assignment to basic labels, the agentshould then be able to evaluate the appropriateness measure ofmore complex compound expressions. The ranking of availablelabels would seem to be an intuitive first step for an agentto take when faced with the decision problem about what toassert. In addition, the direct allocation of probabilities to arange of complex compound expressions so that the values areinternally consistent is a fundamentally difficult task. Hence, arestriction on such evaluations to only the basic labels wouldhave significant practical advantages in terms of computationalcomplexity.

Definition 8 (Ordering on labels): For x ∈ Ω, let x be anordering on LA such that for L,L′ ∈ LA, L′ x L means thatL is at least as appropriate as a label for x as L′.

The identification by an agent of an ordering on labels x

for a particular x ∈ Ω (as in Definition 8) restricts the pos-sible label sets, which they can then consistently allocate toDx . For instance, if L′ x L, then this implies that if L′ ∈ Dx ,then so is L ∈ Dx , since L is as least as appropriate a de-scription for x as L′. Hence, given x for which L′ x L,it must hold that mx(S) = 0 for all S ⊆ LA, where L′ ∈ Sand L ∈ S. Trivially, from Definition 6, this also means thatµL ′(x) ≤ µL (x). Given these observations, an important ques-tion is whether the information provided by ordering x to-gether with a set of appropriateness values µL (x) : L ∈ LA forthe basic labels, which is consistent with x , is sufficiently tospecify a unique mass function mx . Note that in the label seman-tics framework, the identification of a unique mass function mx

in this way immediately enables the agent to apply Definition 6in order to evaluate the appropriateness µθ (x) of any com-pound expression θ from the appropriateness measure valuesfor the basis labels. In fact, in the case that x is a total(linear) ordering, it is not difficult to see that such a uniquemapping does indeed exist between the mass function and theappropriateness measures of basic labels. To see this, supposethat we index the labels in LA so that Ln x Ln−1 x · · · x

L1 with corresponding appropriateness measures µLn(x) =

an ≤ µLn −1 (x) = an−1 ≤ · · · ≤ µL1 (x) = a1 . Now, from theearlier discussion, we have that in this case, the only pos-sible values for Dx are from the nested sequence of sets∅, L1, L1 , L2, . . . , L1 , . . . , Li, . . . , L1 , . . . , Ln. Thistogether with the constraints imposed by Definition 6 thatfor each label, ai = µLi

(x) =∑

S :Li ∈S mx(S) results in the


following unique mass function:

mx := L1 , . . . , Ln : an , . . . , L1 , . . . , Li : ai − ai+1

. . . , L1 : a1 − a2 , ∅ : 1 − a1 .

Hence, for x a total ordering, we see that µθ (x) can be de-termined as a function of the appropriateness measure valuesµL (x) : L ∈ LA on the basic labels. For an expression θ ∈ LE,this function is a composition of the aforementioned mapping,in order to determine a unique mass function and the conse-quent summation of mass function values across λ(θ), as givenin Definition 6, to evaluate µθ (x). Although functional in thiscase, the calculus for appropriateness measures cannot be truthfunctional in the sense of fuzzy logic, since appropriatenessmeasures satisfy all the classical Boolean laws, and a well-known result due to Dubois and Prade [4] shows that no truth-functional calculus can, in general, preserve all such laws. For amore detailed discussion of the difference between functionalityand truth functionality, see Lawry [10]. The following theoremshows that in the case where x is a total ordering, the max andmin combination rules can be applied in certain restricted cases.

Theorem 9 ([9], [30]): Let LE∧,∨ ⊂ LE denote those expres-sions generated recursively from LA by the use of only theconnectives ∧ and ∨. If the appropriateness measures on ba-sic labels are consistent with a total ordering x on LA, then∀θ, ϕ ∈ LE∧,∨, it holds that

µθ∧ϕ (x) = min (µθ (x) , µϕ (x))

µθ∨ϕ (x) = max (µθ (x) , µϕ (x)) .

In the case that x is only a partial ordering on LA, then,in general, this does not provide the agent with sufficient in-formation to determine a unique mass function from the ap-propriateness measure values on the basic labels. Instead, fur-ther information is required for the agent to evaluate a massfunction and, consequently, the appropriateness of compoundlabel expressions. In Lawry [11], it is proposed that this addi-tional information takes the form of conditional independenceconstraints imposed by a Bayesian network generated by x .These additional assumptions are then sufficient to determinemx uniquely. Details of this approach, however, are beyond thescope of this paper. Instead, in the examples presented in thesequel, we will assume that the ordering x is total.

B. Appropriateness Measures and Mass Functions as Informa-tion Granules

From Definition 6, we see that the value of all appropri-ateness measures for expressions in LE to a given elementx ∈ Ω can be determined from the mass function mx . From thisperspective, the mass values mx(F ) for F ∈ F as x varies,completely determine the relationship between labels and ex-pressions and the elements of Ω. Consequently, viewed as func-tions of x, the set of mappings mx(F ) : F ∈ F form naturalinformation granules. Furthermore, if the ordering on labelsx is assumed to be a total ordering for all x ∈ Ω, then asoutlined earlier, the information granules mx(F ) can be deter-mined directly from the appropriateness measures for the basic

Fig. 1. Appropriateness measure values for labels low, medium, and highviewed as a function of x, as x varies across Ω = [0, 30].

Fig. 2. Mass function values for the sets low, low, medium,medium, medium, high, and high viewed as a function of x, asx varies across Ω = [0, 30].

labels. Hence, in this case, the mappings µLi: Ω → [0, 1] for

Li ∈ LA form the primitive information granules to charac-terizing all relationships between labels and expressions, andthe underlying domain Ω. For example, in Fig. 1, we haveprimitive information granules defined by appropriateness mea-sures for the labels LA = low,medium, high. Here, foreach label Li ∈ LA, µLi

: Ω → [0, 1] is defined as a trape-zoidal function of x ∈ Ω = [0, 30]. Assuming a total order-ing x for all x ∈ Ω results in mass functions for the focalsets F = l, l,m, m, m,h, h, which are shownas triangular functions in Fig. 2. These triangular functionsthen correspond to the information granules for the focalsets in F .

The direct use of focal sets as information granules in linguis-tic models can, in some cases, allow for more straightforwardinformation processing. In particular, note that the mass func-tion mx defines a probability distribution on Dx , which can inturn make it relatively straightforward to evaluate probabilityvalues from a granular model based on such functions. In thefollowing section, we outline a prototype theory interpretationof label semantics, which clearly links appropriateness mea-sures to Zadeh’s original description of information granules as“clumps of (similar) objects” [34].

IV. PROTOTYPES AND INFORMATION GRANULES

A prototype theory interpretation of label semantics has re-cently been proposed [13], [16] in which the basic labels LAcorrespond to natural categories, each with an associated set ofprototypes. A label Li is then deemed to be an appropriatedescription of an element x ∈ Ω, provided x is sufficiently


similar to the prototypes of Li . The requirement of being “suf-ficiently similar” is clearly imprecise and is modeled here byintroducing an uncertain threshold on distance from prototypes.

A distance function d is defined on Ω such that d : Ω2 →[0,∞) and satisfies d(x, x) = 0 and d(x, y) = d(y, x) for allelements x, y ∈ Ω. This function is then extended to sets ofelements such that for S, T ⊆ Ω, d(S, T ) = infd(x, y) : x ∈S and y ∈ T. For each label Li ∈ LA, let there be a set Pi ⊆ Ωcorresponding to prototypical elements for which Li is certainlyan appropriate description. Within this framework, Li is deemedto be appropriate to describe an element x ∈ Ω, provided x issufficiently close or similar to a prototypical element in Pi . Thisis formalized by the requirement that x is within a maximaldistance threshold ε of Pi , i.e., Li is appropriate to describe xif d(x, Pi) ≤ ε, where ε ≥ 0. From this perspective, an agent’suncertainty regarding the appropriateness of a label to describea value x is characterized by his or her uncertainty regarding thedistance threshold ε. Here, we assume that ε is a random variableand that the uncertainty is represented by a probability densityfunction δ for ε defined on [0,∞). Within this interpretation, anatural definition of the complete description of an element Dx ,and the associated mass function mx can be given as follows.

Definition 10 (Prototype interpretations of Dx and mx ):For ε ∈ [0,∞) Dε

x = Li ∈ LA : d(x, Pi) ≤ ε and mx(F ) =δ(ε : Dε

x = F).1Appropriateness measures can then be evaluated according to

Definition 6. Alternatively, we can define a random set neigh-borhood for each expression θ ∈ LE corresponding to thoseelements of Ω, which can be appropriately described as θ, andthen, define µθ (x) as the single point coverage function of thisrandom set as follows.

Definition 11 (Random set neighborhood of an expression):For θ ∈ LE and ε ∈ [0,∞), N ε

θ ⊆ Ω is defined recursivelyas follows: ∀Li ∈ LA, N ε

Li= x ∈ Ω : d(x, Pi) ≤ ε; ∀θ, ϕ ∈

LE N εθ∧ϕ = N ε

θ ∩N εϕ , N ε

θ∨ϕ = N εθ ∪N ε

ϕ , and N ε¬θ = (N ε

θ )c .Theorem 12 (Random neighborhood representation theorem

[16]):

∀θ ∈ LE, ∀x ∈ Ω µθ (x) = δ(ε : x ∈ N εθ )

where µθ (x) is determined from the mass function mx , asgiven in Definition 10, according to the equation µθ (x) =∑

F ∈λ(θ) mx(F ), as given in Definition 6.For example, for Li ∈ LA, N ε

Li= x : d(x, Pi) ≤ ε.

Hence, µLi(x) = ∆(d(x, Pi)), where ∆(ε) = δ([ε,∞)).

Theorem 12 shows a clear link between appropriateness mea-sures, and Goodman and Nguyen’s characterization of fuzzy setmembership functions as single-point coverage functions of ran-dom sets [1], [6], [7], [19]. Furthermore, Theorem 12 shows thatN ε

θ characterizes the relationship between the expression θ andthe elements of the underlying universe Ω and, hence, can beviewed as an information granule in this important sense. Inaddition, clearly, neighborhoods of labels N ε

Lican be viewed

as information granules in the sense of Zadeh [34], since theycorrespond to sets of similar points all of which lie within a

1For Lesbegue measurable set I ⊆ [0,∞), we denote δ(I) =∫

Iδ(ε)dε, i.e.,

we also use δ to denote the probability measure induced by density function δ.

Fig. 3. Neighborhood of label Li .

threshold ε of Pi (see Fig. 3). Again such granules might beviewed as primitives, since according to Definition 11, all otherneighborhoods can be generated recursively from them.

It is also of interest to note that there is a relationship betweenthe above prototype theory model and rough set theory [20]. Letus take N ε

Li: Li ∈ LA to be the primitive information granules

for Ω. Then, following rough set theory, any set S ⊆ Ω can bedescribed, at threshold level ε, by the rough set (Dε

S ,DεS ), where

DεS = Li ∈ LA : N ε

Li⊆ S and

DεS = Li ∈ LA : N ε

Li∩ S = ∅.

Furthermore, since the threshold ε is uncertain, (DεS ,Dε

S ) is arandom rough set with mass function mS given by, for F ⊆G ⊆ LA

mS (F,G) = δ(ε : DεS = F and Dε

S = G).Note, in the case where S = x is a singleton set, that ∀εDε

S = DεS = Dε

x and for all F ⊆ Ω mS (F, F ) = mx(F ).Example 13: Let Ω = R and LA = L1 , L2, where the two

labels have prototypes P1 = 1 and P2 = 2, respectively. Inaddition, let δ be the uniform distribution on the interval [0, 1].Then for S = [0.5, 3], we have that

N εL1

= [1 − ε, 1 + ε] and N εL2

= [2 − ε, 2 + ε]

N εL1

⊆ S iff ε ≤ 0.5, but ∀ε N εL1

∩ S = ∅N ε

L2⊆ S iff ε ≤ 1, but ∀ε N ε

L2∩ S = ∅.

Hence

(DεS ,Dε

S ) =

(L1 , L2, L1 , L2) : ε ≤ 0.5

(L2, L1 , L2) : 0.5 < ε ≤ 1

(∅, L1 , L2) : ε > 1.

Furthermore

mS (L1 , L2, L1 , L2) =∫ 0.5

0δ(ε)dε = 0.5

mS (L2, L1 , L2) =∫ 1

0.5δ(ε)dε = 0.5

mS (∅, L1 , L2) =∫ ∞

1δ(ε)dε = 0.


V. GRANULAR INFERENCE FROM IMPRECISE

AND UNCERTAIN INFORMATION

Let us suppose that an agent is aiming to describe, using labelsfrom LA, a value from Ω about which there is uncertainty. Inthis case, they need to identify the labels from LA, which areappropriate to describe a variable X that takes values in Ω,based on their partial knowledge of linguistic conventions andtheir partial knowledge of X . In such cases, the agent will aimto identify the set of labels appropriate to describe X denotedDX . The associated mass function, which quantifies the agentsbelief that DX = F , will be denoted by mX .

Definition 14 (Appropriateness measure for X):

∀θ ∈ LE µθ (X) =∑

F ∈λ(θ)

mX (F ).

A. Imprecise Information

Let us suppose the agent learns that X ∈ S for S ⊆ Ω. Thisnaturally generates a sample of possible mass functions as candi-dates for mX , given by (MS , n), where MS = mx : x ∈ Sis the set of possible mass functions, and n : MS → N≥1 is thecount function n(m) = |x ∈ S : mx = m|. Hence, one ap-proach would be for the agent to identify a representative massfunction by taking the mean of this sample so that

∀F ∈ F mX (F |S) =

∑m∈MS

n(m)m(F )∑m∈MS

n(m)=

∑x∈S mx(F )

|S| .

B. Uncertain Information

Let us suppose the agent learns that X has probability distri-bution p, then this naturally generates a probability distribution qon the set of possible mass functions mX given byMΩ = mx :x ∈ Ω, where for m ∈ MΩ q(m) = p(x ∈ Ω : mx = m).A representative mass function would then correspond to theexpected value of q as follows:

∀F ∈ F mX (F |p) =∑

m∈MΩ

q(m)m(F ) =∑x∈Ω

p(x)mx(F ).

Alternatively, the agent may not know p precisely, but in-stead may have information in the form of a sample of ele-ments drawn from p of the form S = (S, n′), where S ⊆ Ωand n′ : S → N≥1 . This naturally generates a sample of massfunctions (MS , n), where n : MS → N≥1 defined by n(m) =∑

x∈S :mx =m n′(x). The mean mass function is then given by

∀F ∈ F mX (F |S) =

∑m∈MS

n(m)m(F )∑m∈MS

n(m)

=∑

x∈S n′(x)mx(F )∑x∈S n′(x)

.

Example 15: Let us consider a sample S = (S, n′), whereS = x1 , x2 , x3 , x4, and n′(x1) = 5, n′(x2) = 15, n′(x3) =20, and n′(x4) = 10. In addition, let us suppose

LA = low(l),medium(md), high(h)F = l, l,md, md, md, h, h

and mx1 = mx2 = m1 = l,md : 0.5, md : 0.5 and mx3 =mx4 = m2 = md, h : 0.3, h : 0.7. In this case, MS =m1 ,m2, where n(m1) = 5 + 15 = 20, and n(m2) = 20 +10 = 30. Hence, taking the mean mass function, we have

mX =20m1 + 30m2

50= l,md : 0.2, md : 0.2, md, h : 0.18, h : 0.42.

VI. DEFUZZIFYING FROM MASS FUNCTIONS

Given a mass function mX for variable X taking values inΩ, an agent can determine a posterior probability distributionon Ω conditional on mX and, consequently, identify a singlevalue estimate for X by the application of natural defuzzifica-tion methods. Now, suppose a prior probability distribution p isdefined on Ω, then, for F ∈ F , we have that

p(x|DX = F ) =P (DX = F |X = x)p(x)∑

x∈Ω P (DX = F |X = x)p(x)

=mx(F )p(x)∑

x∈Ω mx(F )p(x).

Note that∑

x∈Ω mx(F )p(x) is also a mass function effectivelycorresponding to the prior mass function for X given that X isdistributed according to p. Let this be denoted by pmX (F ) sothat

p(x|DX = F ) =mx(F )p(x)pmX (F )

.

Furthermore, if it is then learnt a posteriori that DX has massfunction mX , then by Jeffreys rule of updating [8], we have that

p(x|mX ) =∑F ∈F

P (x|DX = F )mX (F )

= p(x)∑F ∈F

mx(F )mX (F )pmX (F )

.

From this, a single value for X can be estimated in the standardway by taking the expected value or the mode.

VII. SELECTING ASSERTIONS

In the previous sections, we have introduced appropriatenessmeasures and their underlying mass assignments as a mecha-nism by which an agent can assess the level of appropriatenessof label expressions to describe some instance x ∈ Ω. However,when faced with an object x about which an agent wishes to con-vey information to other individuals in the population, he/shehas a much more specific decision problem. What expressionθ ∈ LE do they choose to assert in order to describe x, and howdo they use their appropriateness measure to guide this choice?

In principle, an agent can assert any appropriate expression inLE. However, in practice, there is only a small subset of expres-sions, which are really assertible. In particular, an agent maytend not to assert expressions, which are logically equivalentto simpler expressions (i.e., those involving fewer connectives).


For example, neither ¬¬L nor (L ∧ L′) ∨ (L ∧ ¬L′) is likely tobe asserted instead of L. In addition, there is some evidence tosuggest that as humans we tend to not use negative statementsas descriptions if at all possible. One is much more likely to de-scribe the color of this paper as white rather than not red, eventhough both expressions are appropriate. However, we may usepurely negative statements in situations, where none of our la-bel descriptors are appropriate. This can certainly occur if weare labeling elements of a continuum. For example, we mayencounter colors for which none of our color descriptors areappropriate. Overall, this suggests that while purely negativeexpressions may be assertible, there is likely to be an a prioripropensity to use positive expressions.

We now introduce a model of the decision process by whichan agent identifies a particular assertion to describe an object,taking account of the measure of appropriateness, as definedin the previous section. Let AS ⊂ LE denote the finite set ofpermitted assertions. Let AX be the assertion selected by theagent to describe variable X . Now each set of appropriate labelsDX = F identifies a set of possible values for AX correspond-ing to these expression θ ∈ AS for which F ∈ λ(θ) (i.e., thoseexpressions consistent with F ). Hence, the mass assignmentmx on 2LA naturally generates a mass assignment on sets ofpossible assertions (2AS ) as follows.

Definition 16 (Mass assignment on assertions [15]):mA

X : 2AS → [0, 1] is defined such that if ∀G ⊆ AS

mAX (G) =

∑F ∈F :C(F )=G

mX (F )

where C : F → 2AS , and

∀F ∈ F C (F ) = θ ∈ AS : F ∈ λ (θ) .

Hence, C (F ) identifies those assertions, which are consistentwith the knowledge that F is the set of appropriate labels.

Let us suppose AS is sufficiently large so that ∀F ∈ F , ∃θ ∈AS such that F ∈ λ(θ) (i.e., so that mA

X (∅) = 0); then, fromDefinition 16, we can define a belief and plausibility measureon Ax in the normal manner so that

∀S ⊆ AS Bel (AX ∈ S) = BelAX (S) =∑G⊆S

mAX (G)

Pl (AX ∈ S) = PlAX (S) =∑

G :G∩S =∅mA

X (G) .

This plausibility measure on assertions can be related directly toappropriateness measures according to the following theorem.

Theorem 17: See, for e.g., [15]:

∀θ ∈ AS PlAX (θ) = µθ (X) .

Note that BelAX (θ) = mAX (θ), which corresponds to the

level of belief that θ is the only assertible expression, which isappropriate to describe X . Now, given the mass assignment mA

X

and a prior distribution P on the assertible expressions AS, wecan define a probability distribution for AX in accordance withShafer [28] as follows.

Definition 18 (Probability of an assertion [15]):

∀θ ∈ AS P (AX = θ) = PAX (θ) = P (θ)

∑S⊆AS:θ∈S

mAX (S)

P (S)

= P (θ)∑

F ∈F :F ∈λ(θ)

mX (F )P (θ ∈ AS : F ∈ λ(θ))

.

Note that in the case that the prior probability P on AS isuniform, then PA

X is the pignistic distribution of BelAX , as definedby Smets [29]. In addition, from Shafer [28] and Theorem 17,we know that

∀θ ∈ AS mAX (θ) ≤ PA

X (θ) ≤ µθ (X) .

Example 19: Let us consider the following mass function, asgiven in Example 15:

mX = l,md : 0.2, md : 0.2, md, h : 0.18, h : 0.42

Now, suppose AS = l,md, h,¬l,¬md,¬h; then, we have

C(l,md) = l,md,¬h, C(md) = md,¬l,¬hC(md, h) = md, h,¬l, C(h) = h,¬l,¬md

and hence

mAX = l,md,¬h : 0.2, md,¬l,¬h : 0.2,

md, h,¬l : 0.18, h,¬l,¬md : 0.42.

Now, assuming a uniform distribution on AS, then PAX corre-

sponds to the pignistic distribution of the earlier mass functionas follows:

PAX (l) =

0.22

= 0.06667

PAX (m) =

0.23

+0.23

+0.183

= 0.19333

PAX (h) =

0.183

+0.423

= 0.2

PAX (¬l) =

0.23

+0.183

+0.423

= 0.26667

PAX (¬md) =

0.423

= 0.14

PAX (¬h) =

0.23

+0.23

= 0.13333.

VIII. LINGUISTIC MODELS

Let us consider a decision problem with attributesX1 , . . . , Xk and decision variable Y with underlying universesΩi : i = 1, . . . , k and ΩY , respectively. In addition, let LAi ,LEi , and Fi denote the set of labels, label expressions, and fo-cal sets, respectively, defined for attribute Xi for i = 1, . . . , k.Similarly, let LAY , LEY , and FY denote the set of labels, labelexpressions, and focal sets, respectively, defined for the decisionvariable Y . Now, given linguistic information about X1 , . . . , Xk

in the form of mass functions m1 , . . . ,mk , a linguistic mappingprovides a mechanism to infer a mass function mY , providinglinguistic information concerning the value of Y .


TABLE IFOCAL SET FUNCTION f ′ : F1 ×F2 → FY

TABLE IILINGUISTIC MAPPING CALCULATION

Definition 20 (Linguistic mapping): A linguistic map-ping is a function f : M1 × · · · ×Mk → MY , whereMi = m : 2LA i → [0, 1] :

∑F ∈Fi

m(F ) = 1, and MY =m : 2LAY → [0, 1] :

∑F ∈FY

m(F ) = 1.A simple form of linguistic mapping can be generated from

a focal set mapping f ′ : F1 × · · · × Fk → FY as follows.Definition 21 (Generalized focal set mapping): Let f ′ : F1 ×

· · · × Fk → FY , then we can define a linguistic mapping basedon f ′ according to

f(m1 , . . . ,mk ) = mY , where

∀G ∈ FY mY (G) =∑

F1 ,...,Fk :f ′(F1 ,...,Fk )=G

k∏i=1

mi(Fi).

Example 22: Let k = 2 so that f : M1 ×M2 → MY . Let ussuppose that LA1 = low(l),medium(md), high(h), LA2 =cold(cd), normal(nm), hot(ht), and LAY = slow(sl),standard(sd), fast(ft). In addition, let us suppose that F1 =l, l,md, md, md, h, h, F2 = cd, cd, nm,nm, nm, ht, ht, and FY = sl, sl, sd, sd,sd, ft, ft.

Let f ′ : F1 ×F2 → FY be defined as in Table I.Now, let us suppose that we have information about X1 and

X2 in the form of the following mass functions:

mX 1 = m1 = l : 0.5, l,md : 0.1, md : 0.4 and

mX 2 = m2 = cd : 0.3, cd, nm : 0.3, ht : 0.4.

Then, applying Definition 21, we can now determine the out-come of the linguistic mapping f(m1 ,m2). The process isillustrated in Table II, where the cell corresponding to focalsets F1 and F2 contains the information f ′(F1 , F2) : m1(F1) ×m2(F2).

Now, for each focal set G ∈ FY that appears in Table II, weaggregate the mass values of G across the table to determine thecorresponding mass value for G in f(m1 ,m2) so that

f(m1 ,m2)

= sl : (0.15), sl, sd : (0.03 + 0.12)

sd : (0.15 + 0.03), ft : (0.12 + 0.2 + 0.04 + 0.16)

= sl : 0.15, sl, sd : 0.15, sd : 0.18, ft : 0.52.

Generalized focal set mappings do not allow for the represen-tation of uncertainty within the mapping between descriptionsof the input and descriptions of the decision variable. Such un-certainty can be encoded by means of a mass relation defined asfollows.

Definition 23 (Joint mass relation): A joint mass relation on2LA1 × · · · × 2LAk is a multidimensional joint mass functionsuch that

m : 2LA1 × · · · × 2LAk → [0, 1] and∑F1 ∈F1

· · ·∑

Fk ∈Fk

m(F1 , . . . , Fk ) = 1.

Definition 24 (Relational linguistic mapping): Let m be ajoint mass relation on 2LA1 × · · · × 2LAk × 2LAY . Then, formi ∈ Mi : i = 1, . . . , k, let

f(m1 , . . . ,mk )

=∑

F1 ∈F1

· · ·∑

Fk ∈Fk

m(•|F1 , . . . , Fk )k∏

i=1

mi(Fi), where

∀G ∈ FY m(G|F1 , . . . , Fk ) =m(F1 , . . . , Fk ,G)∑

G∈FYm(F1 , . . . , Fk ,G)

.

From Definition 24, we can see that the linguistic mappingf can be determined directly from a conditional mass functionon FY |F1 × · · · × Fk as we will now illustrate in the followingexample.

Example 25: Let k = 2 so that f : M1 ×M2 → MY . Letus suppose that LA1 , LA2 , LAY , F1 , F2 , and FY are definedas in Example 22. A conditional mass relation for FY |F1 ×F2that represents the linguistic mapping f is given in Table III.

Now, suppose we have information about X1 and X2 in theform of the following mass functions:

mX 1 = m1 = l : 0.5, l,md : 0.1, md : 0.4 and

mX 2 = m2 = cd : 0.3, cd, nm : 0.3, ht : 0.4.

Applying Definition 24, we now determine f(m1 ,m2) by takingthe product of the mass values m1(F1) and m2(F2) for each pairof focal sets (F1 , F2) ∈ F1 ×F2 with the conditional massesgiven in Table III and then summing the results for each focal setG ∈ FY . This process is illustrated in Table IV, where the massvalues on FY in each cell are given by the product of the massvalues from the conditional mass relation for this cell and therespective mass values from m1 and m2 . For example, in the topleft-most cell of the table identified by the pair (l, cd), themass value for focal set sl is given by the mass value for slin the corresponding cell of Table III (i.e., 0.6) multiplied by theproduct of m1(l) and m2(cd) (i.e., 0.3 × 0.5). Hence

f(m1 ,m2)

= sl : (0.09 + 0.006), sl, sd : (0.06 + 0.024 + 0.06)

sd, ft : (0.045+ 0.003+ 0.024+ 0.05+ 0.012+ 0.032)

ft : (0.096 + 0.15 + 0.028 + 0.128)

= sl : 0.096, sl, sd : 0.144, sd : 0.192

sd, ft : 0.166, ft : 0.402.


TABLE IIICONDITIONAL MASS FUNCTION

TABLE IVMASS CALCULATION

A. Independence Assumptions for Mass Relations

In order to directly define a conditional mass relationm : FY |F1 × · · · × Fk → [0, 1], we are required to explic-itly identify

∏ki=1 |Fi | distinct mass functions in MY (i.e.,

|FY |∏k

i=1 |Fi | mass values). For large k, this may not be feasi-ble in practice. Alternatively, it may be possible to make certainindependence assumptions between DXi

for i = 1, . . . , k. Oneof the strongest assumptions of this kind is a version of NaiveBayes applied to conditional mass relations. In this case, it isassumed that the descriptions DX 1 , . . . ,DXk

are conditionallyindependent given the value of DY .

Let the relationship between X1 , . . . , Xk and Y be rep-resented by the joint mass relation m : F1 × · · · × Fk ×FY → [0, 1]. Now, since m is a joint probability measure onDX 1 , . . . ,DXk

and DY , we have by Bayes theorem that

∀G ∈ FY , ∀Fi ∈ Fi : i = 1, . . . , k

m(G|F1 , . . . , Fk ) =m(F1 , . . . , Fk |G)mY (G)∑

G∈FYm(F1 , . . . , Fk |G)mY (G)

where mY (G) =∑

F1 ∈F1

· · ·∑

Fk ∈Fk

m(F1 , . . . , Fk ,G).

Now on assumption of conditional independence betweenDX 1 , . . . ,DXk

, given DY , we have that

∀G ∈ FY , ∀Fi ∈ Fi : i = 1, . . . , k

m(F1 , . . . , Fk |G) =k∏

i=1

m(Fi |G).

In this case, the conditional mass relation m : FY |F1 ×· · · × Fk → [0, 1] can be completely determined by the one-dimensional (1-D) conditional relations m : Fi |FY → [0, 1] fori = 1, . . . , k and mY . Hence, the number of mass values, which

must be determined is of order |FY |(1 +∑k

i=1 |Fi |). Even forfocal sets of moderately large cardinality, this will be signifi-cantly less that |FY |

∏ki=1 |Fi | for large k.

In the case when an assumption of complete conditional in-dependence is inappropriate semi-independent models may bedefined. In this case, we identify subsets of variables V1 , . . . ,Vr

to form a partition of X1 , . . . , Xk. Conditional dependenceis then assumed within each variable grouping Vi , but inde-pendence is assumed between groups. Specifically, we defineconditional mass functions for each variable set Vi of the form∀G ∈ FY , ∀Fj ∈ Fj : j = 1, . . . , k

m(Fj : Xj ∈ Vi |G) : i = 1, . . . , r.

According to the semi-independence assumption, we then havethat

m(F1 , . . . , Fk |G) =r∏

i=1

m(Fj : Xj ∈ Vi |G)

and hence

∀G ∈ FY , ∀Fi ∈ Fi : i = 1, . . . , k

m(G|F1 , . . . , Fk )

=(∏r

i=1 m(Fj : Xj ∈ Vi |G))mY (G)∑G∈FY

(∏r

i=1 m(Fj : Xj ∈ Vi |G)))mY (G).

For semi-independent models, the total number of mass valuesto be defined is of order |FY |

∑ri=1

∏Xj ∈Vi

|Fj |. Provided thataverage cardinality of the variable sets |Vi | is relatively low, thiswill again tend to be significantly less than |FY |

∏ki=1 |Fi | for

large k.

B. Tree Structured Mass Relations

Mass relations can also be encode by linguistic decision trees(LDTs). These are decision trees with descriptions of variablesDXi

as nodes and the corresponding focal sets Fi ∈ Fi asbranches. Consequently, a branch Bj of a LDT is a conjunc-tion of the following form:

(DXj 1= Fj1 ) ∧ (DXj 2

= Fj2 ) ∧ · · · ∧ (DXj d= Fjd

).

Associated with each branch Bj is a mass conditional relationof the following form:

∀G ∈ FY m(G|Bj ) = m(G|Fj1 , . . . , Fjd).


Fig. 4. Linguistic decision tree.

Definition 26 (Linguistic decision tree mapping): Let us con-sider a LDT with branches Bj : j = 1, . . . , T , each with asso-ciated conditional mass relation m(G|Bj ) as earlier. This treedefines a linguistic mapping of the following form:

f(m1 , . . . ,mk ) =T∑

j=1

m(•|Bj )P (Bj |m1 , . . . ,mk ), where

P (Bj |m1 , . . . ,mk ) =d∏

i=1

mji(Fji

).

In the case where the focal sets Fi : i = 1, . . . , k and FY

form crisp partitions of Ωi : i = 1, . . . , k and ΩY , respectively,then the linguistic mapping given in Definition 26 correspondsto the output of a classical probabilistic decision tree [25].

Example 27: Let k = 2 so that f : M1 ×M2 → MY . Letus suppose that LA1 , LA2 , LAY , F1 , F2 , and FY are definedas in Example 22. An example of a LDT representing f is givenin Fig. 4. Now suppose we have information about X1 and X2in the form of the following mass functions:

mX 1 = m1 = l : 0.5, l,md : 0.1, md : 0.4 and

mX 2 = m2 = cd : 0.3, cd, nm : 0.3, ht : 0.4.

From this, we can determine the probabilities of the branchesgiven m1 and m2 as follows:

P (B1 |m1 ,m2) = m1(l) = 0.5

P (B2 |m1 ,m2) = m1(l,m)×m2(cd) = 0.1× 0.3 = 0.03

P (B3 |m1 ,m2) = m1(l,m) × m2(cd, nm)= 0.1 × 0.3 = 0.03

P (B4 |m1 ,m2) = P (B5 |m1 ,m2) = 0

P (B6 |m1 ,m2) = m1(l,m)×m2(ht) = 0.1× 0.4 = 0.04

P (B7 |m1 ,m2) = m1(md) = 0.4

P (B8 |m1 ,m2) = · · · = P (B13 |m1 ,m2) = 0.

We can then determine the linguistic mapping value accordingto

f(m1 ,m2) = 0.5(sl : 0.8, sl, sd : 0.2)

+ 0.03(sl : 0.2, sl, sd : 0.8)

+ 0.03(sd : 0.9, sd, ft : 0.1)

+ 0.04(sd : 0.7, sl, sd : 0.3)

+ 0.4(sl, s : 0.25, sd : 0.75)

= sl : 0.406, sl, sd : 0.236, sd : 0.355

sd, ft : 0.003.

C. Computational Costs

For the three types of linguistic mappings considered earlier,the computational cost for the evaluation of f(m1 , . . . ,mk ) isdominated by the cost of evaluation of the product of the inputmass assignments m1 × · · · × mk . The latter is O(

∏ki=1 ri),

where ri = |F ∈ Fi : mi(F ) > 0| is the number of possiblefocal set values for DXi

, given the available input informa-tion. For LDTs, the computational cost is typically lower beingO(T ), where T ≤

∏ki=1 ri is the number of branches in the

tree. In practice, ri will tend to be a relatively low value for eachvariable Xi . For example, in the case, where the value of Xi

is known to be xi ∈ Ωi , then mi = mxi, which is determined

according the consonance mapping, as outlined in Section III.If, in this case, the underlying labels are defined by trapezoidalappropriateness measures, as in Fig. 1, so that the associatedmass function granules are triangular functions, as in Fig. 2,then mxi

is nonzero for at most two focal sets, i.e., ri ≤ 2.

D. Determining Mass Relations From Data

Let us suppose we have a dataset S = (S, n), which providespartial (and potentially noisy) information about an underlyingfunctional mapping between X1 , . . . , Xk and Y . In this case,S ⊆ Ω1 × · · · × Ωk × ΩY and n : S → N≥1 . From this, we can


define a conditional mass relation to represent the underlyingmapping as follows:

m(G|F1 , . . . , Fk ) =

∑(x,y )∈S n(x, y)my (G)

∏ki=1 mxi

(Fi)∑(x,y )∈S n(x, y)

∏ki=1 mxi

(Fi).

This approach allows for the calculation of all the special formsof mass relations described earlier. For example, for the inde-pendent and semi-independent models, we can determine

∀Fi ∈ Fi ,∀G ∈ FY

m(Fi |G) =

∑(x,y )∈S n(x, y)mxi

(Fi)my (G)∑(x,y )∈S n(x, y)my (G)

and similarly

m(Fj : Xj ∈ Vi |G)

=

∑(x,y )∈S n(x, y)(

∏Xj ∈Vi

mxj(Fj ))my (G)∑

(x,y )∈S n(x, y)my (G).

Furthermore, in LDTs for branch

Bj = (DXj 1= Fj1 ) ∧ (DXj 2

= Fj2 ) ∧ · · · ∧ (DXj d= Fjd

)

we can determine

m(G|Bj ) =

∑(x,y )∈S n(x, y)(

∏di=1 mxj i

(Fji))my (G)∑

(x,y )∈S n(x, y)(∏d

i=1 mxj i(Fji

)).

The aforementioned methods for the calculation of mass re-lations from data have been applied in a range of machine-learning and data-mining algorithms. For example, Randon andLawry [26] proposed an algorithm for learning mass relationalmappings from data, which automatically searches for depen-dency groupings among the variables of the form described inSection VIII-A. This algorithm has been employed extensivelyin flood prediction, in order to learn time-dependent linguisticmodels to predict future water levels of the basis of historicaldata (see Randon et al. [24] and [27]).

The Linguistic ID3 (LID3) algorithm has been developedby Qin and Lawry [23] to learn LDTs for data.This extendsthe well-known ID3 algorithm by the incorporation of massrelations into the entropy search heuristics by applying themethods described in this section. Qin and Lawry [23] pro-vides an extensive study of the performance of LID3 whenapplied to well-known benchmark classification problems fromthe University of California at Irvine (UCI) repository.LID3 hasbeen widely applied across a number of application domains.For example, Turnbull et al. [31] used LID3 to learn LDTsfor threat avoidance in an online controller for unmanned airvehicles. In addition, McCulloch et al. [17] applied LDTs tothe classification of weather radar images as a means to de-tect noisy regions. In addition, in the environmental domain,McCulloch et al. [18] devised an online updateable version ofLID3 and applied this to flood forecasting on the River Severnin U.K.

Example 28: This example of granular modeling by the use ofLDTs is based on an application of the LID3 algorithm to iden-

tify a region of weather radar images known as bright band [17].The quantitative use of radar-based precipitation estimations inhydrological modeling for flood forecasting has been limiteddue to different sources of uncertainty in the rainfall estimationprocess. An important source of uncertainty is vertical reflectiv-ity of precipitation (VPR). This is largely due to factors such asthe growth or evaporation of precipitation, the thermodynamicphase of the hydrometeors, or melting and wind effects. As therange increases from the radar, the radar beam is at some heightabove the ground, while the radar sampling volume increasesand is unlikely to be homogeneously filled by hydrometeors.As an example, the lower part of the volume could be in rain,whereas the upper part of the same volume could be filled withsnow or even be without an echo. This variability affects reflec-tivity measurements, and the estimation of precipitation may notrepresent the rainfall rate at the ground. Snowflakes are generallylow-density aggregates, and when they start to melt, they appearas large raindrops to the radar, which result in larger values ofreflectivities compared with the expected reflectivity below themelting layer. This phenomenon is called “bright band” andthe interception of the radar beam with melting snowflakes cancause significant overestimates of precipitation by up to a fac-tor of 5. When the radar beam is above the bright band, thiscan cause underestimates of precipitation by up to a factor of 4per kilometer above the bright band. Therefore, the bright bandneeds to be detected and corrected for.

McCulloch et al. [17] applied the LID3-learning algorithmto range height indicator (RHI) scans from an S-band (9.75 cmwavelength) weather radar based in Chilbolton (U.K.). The ob-jective was to obtain a set of rules in the form of a LDT to classifypixels of vertical reflectivity profile images as being either snow,rain, or bright band. The input variables for the linguistic map-ping were as follows: Reflectivity factor (Zh ), the differentialreflectivity (Zdr), the linear depolarization ratio (Ldr), and theheight measurement (H0). The data for the experiments weregenerated from 1354 images resulting in 191 235 labeled datavectors. Examples of rules corresponding to branches of theresulting LDT are as follows.

1) (DLd r = low, medium) ∧ (DH = med., high) ∧(DZh

= high) ∧ (DZd r = med., high) →Rain :0.998, Snow : 0.002;

2) (DLd r = medium) ∧ (DH = med., high) ∧ (DZh=

med., high) ∧ (DZd r= low) → Rain : 0.03,

Snow : 0.97;3) (DLd r = high) ∧ (DH = med.) ∧ (DZh

= high)∧ (DZd r = high) → Rain : 0.02, Bright band :0.98;

where, for each input domain, the labels low, medium, and highwere defined as uniformly spaced trapezoidal appropriatenessmeasures, and, Rain, Snow, and Bright band are nonoverlap-ping classes. Table V shows a comparison of the results of LID3with a number of other machine-learning algorithms, which in-clude Naive Bayes, neural networks, support vector machines(SVMs), and k-nearest neighbor (KNN). The results refer to av-erage percentage accuracy in a tenfold cross-validation experi-ment, where the algorithms are repeatedly trained on a sampleof 9/10th of the data and then tested on the remaining 1/10th.


TABLE VCOMPARISON OF RESULTS WITH LID3 AND MACHINE-LEARNING ALGORITHMS

Fig. 5. RHI scan from the Chilbolton Radar dataset.

Fig. 6. Classification of scan in Fig. 5 by the use of the LID3 algorithm. Lightblue indicates rain, green indicates snow, and red indicates bright band.

Fig. 5 shows a particular RHI scan, and Fig. 6 shows the samescan after each pixel has been classified as either rain, snow, orbright band by the use of the LDT.

E. Hierarchical Linguistic Mappings

Let us suppose that we have a functional mapping g :Ω1 × · · · × Ωk → ΩY between X1 , . . . , Xk and Y , which istoo complex to define directly. Attribute hierarchies [3] area well-known approach to this problem and involve breakingdown the function g into a hierarchy of subfunctions, eachrepresenting a new intermediate attribute. A bottom-up de-scription of this process is as follows: The set of originalattributes X1 , . . . , Xk are partitioned into attribute subsetsV1 , . . . ,Vr , and new attributes Z1 , . . . Zr are defined as func-tions of each partition set, respectively, so that zi = Gi(Vi) fori = 1, . . . , r. The function g is then defined as a new function Fof the new attributes Z1 , . . . Zr so that Y = g(X1 , . . . , Xk ) =F (Z1 , . . . , Zr ) = F (G1(V1), . . . , F (Gr (Vm )). The same pro-

Fig. 7. Attribute hierarchy showing partition of attributes.

Fig. 8. Example of a simple linguistic attribute hierarchy.

cess can then be repeated recursively for each partition set Vi togenerate a new layer of new variables, as required (see Fig. 7).

The identification of attribute hierarchies and their associatedfunctional mappings is often a highly subjective process thatinvolves significant uncertainty and imprecision. Hence, the re-lationship between certain levels in the hierarchy can best bedescribed in terms of linguistic rules and relations. This canallow for judgements and rankings to be made at a level ofgranularity appropriate to the level of precision at which thefunctional mappings can be realistically defined. In linguisticattribute hierarchies, the functional mappings between parentand child attribute nodes in the hierarchy are defined in terms oflinguistic mappings (see Definition 20), which explicitly modelboth the uncertainty and vagueness which often characterizesour knowledge of such aggregation functions.

In linguistic attribute hierarchies [14], the functional relation-ship between child and parent nodes are not defined precisely.Instead, the labels for a parent attribute are defined in termsof the labels which describe the attributes corresponding to itschild nodes, by means of a linguistic mapping. To illustrate thisidea, consider the following simple linguistic attribute hierar-chy, as shown in Fig. 8. Here, we have four input attributesX1 , . . . , X4 and output attribute Y , these being described bylabel sets LA1 , . . . , LA4 and LAY with focal sets F1 , . . . ,F4and FY , respectively. The labels for Y are defined in terms ofthe labels for two intermediate level attributes Z1 and Z2 by alinguistic mapping f1 . Let LAZ1 , LAZ2 and FZ1 , FZ2 be thelabels and focal sets for Z1 and Z2 , respectively. Furthermore,the labels for Z1 are defined in terms of those for X1 and X2according to linguistic mapping f2 , and the labels for z2 aredefined in terms of those for x3 and x4 according to LDT3 .The information is then propagated up through the hierarchyas mass functions on the relevant focal sets. Specifically, f2combines mass functions on F1 and F2 in order to generate a


Fig. 9. Flowchart showing the decision-making process using a linguistic mapping.

mass function on FZ1 . Similarly mass functions on F3 and F4are combined, using f3 to generate a mass function on FZ2 .These two mass functions on FZ1 and FZ2 , respectively, arethen combined according to f1 in order to obtain a mass as-signment on the output focal sets FY conditional on the inputs.At the mass function level, given mi ∈ Mi : i = 1, . . . 4, thecomplete mapping is as follows:

mY = f1(mZ1 ,mZ2 ) = f1(f2(m1 ,m2), f3(m3 ,m4)).

IX. DECISION MAKING USING GRANULAR

LINGUISTIC MODELS

In this paper, we have introduced label semantics as a rea-soning framework according to which, rational communicatingagents can make decisions about which labels from LA can beappropriately used to describe their perceptual information insuch a way as for them to be consistent with the underlyinglinguistic conventions of the population. Given mass functionson label sets for the elements of Ω, appropriateness measuresof any expression θ ∈ LE can be determined. Furthermore, ifagents have uncertainty regarding their perceptual information,this can also be expressed in terms of a mass function on 2LA . Fordecision problems, where a decision variable Y is dependent ona set of input variables X1 , . . . , Xk , linguistic mappings can bedefined, which allow the inference of labeling information aboutY from the available labeling information about X1 , . . . , Xk .Such linguistic mapping can take many forms, which includesgeneralized focal set mappings, mass relational mappings, orLDTs. In the case, where k is large and the relationship betweendecision and input variables is complex, the overall linguisticmapping may be defined as a composite of simpler (lower di-mensional) mappings to form a linguistic attribute hierarchy.

The result of a linguistic mapping is a mass assignment mY

on sets of labels for Y . When required for decision makingor communication, this can be transformed either into a singlelinguistic expression θ ∈ LEY or into a single value estimatey ∈ ΩY for Y . The former requires the identification of a setof permissible assertions AS and a prior distribution on AS ex-pressing prior preferences among these assertions. From this andmY , a posterior distribution PA

Y on assertions can be defined,and a single assertion θ can then be selected by, for example,

picking the modal element. The latter requires the definition of aprior distribution on ΩY , which may correspond to the uninfor-mative uniform distribution if no other information is available.From this and mY , a posterior distribution p(y|mY ) can be de-fined and a single representative value y selected as the modeor expected value. Fig. 9 shows a flowchart summarizing in-formation propagation and decision making involving linguisticmappings.

X. CONCLUSION

The label semantics framework has been presented as an epis-temic approach to modeling granular information as representedby linguistic labels and label expressions. Links between thismodel and prototype theory have been explored to provide aclear relationship with Zadeh’s [34] original conception of in-formation granularity and with Pawlak’s rough set theory [20].Within this framework, decision models can be expressed as lin-guistic mappings, which provide a mechanism to infer labelinginformation about a decision variable from labeling informationabout a set of input variables. A number of different types oflinguistic mappings have been proposed, and it has been shownhow these can be learnt from data. In addition, given labeling in-formation about a variable, we have defined intuitive processesaccording to which an agent can select either a single value forthat variable or a single label expression to describe it. Takentogether, this provides a powerful framework for the propaga-tion of granular linguistic information as part of multivariatedecision processes.

REFERENCES

[1] J. F. Baldwin, J. Lawry, and T. P. Martin, “A mass assignment theory of theprobability of fuzzy events,” Fuzzy Sets Syst., vol. 83, no. 3, pp. 353–368,1996.

[2] A. Bargiela, “Towards a theory of granular computing for human-centeredinformation processing,” IEEE Trans. Fuzzy Syst., vol. 16, no. 2, pp. 320–330, Apr. 2008.

[3] M. Bohanec and B. Zupan, “A function-decomposition method for devel-opment of hierarchical multi-attribute decision models,” Decis. SupportSyst., vol. 36, pp. 215–223, 2004.

[4] D. Dubois and H. Prade, “An introduction to possibility and fuzzy logics,”in Non-Standard Logics for Automated Reasoning, P. Smets et al. Eds.San Francisco, CA: Academic, 1988, pp. 742–755.


[5] P. Gardenfors, Conceptual Spaces: The Geometry of Thought.Cambridge, MA: MIT Press, 2000.

[6] I. R. Goodman and H. T. Nguyen, Uncertainty Model for KnowledgeBased Systems. Amsterdam, The Netherlands: North-Holland, 1985.

[7] I. R. Goodman, “Fuzzy sets as equivalence classes of random sets,” inFuzzy Set and Possibility Theory. R. Yager, Ed. New York: Pergamon,1982, pp. 327–342.

[8] R. C. Jeffrey, The Logic of Decision. New York: Gordon and Breach,1965.

[9] J. Lawry, “A framework for linguistic modelling,” Artif. Intell., vol. 155,pp. 1–39, 2004.

[10] J. Lawry, Modelling and Reasoning with Vague Concepts. New York:Springer-Verlag, 2006.

[11] J. Lawry, “Appropriateness measures: An uncertainty model for vagueconcepts,” Synthese, vol. 161, pp. 255–269, 2008.

[12] J. Lawry and H. He, “Linguistic attribute hierarchies for multiple-attributedecision making,” in Proc. IEEE Int. Conf. Fuzzy Syst., 2007, pp. 1–6.

[13] J. Lawry and Y. Tang, “Relating prototype theory and label semantics,”in Soft Methods for Handling Variability and Imprecision, D. Dubois,M. A. Lubiano, H. Prade, M. A. Gil, P. Grzegorzewski, and O. Hryniewicz,Eds. New York: Springer-Verlag, 2008, pp. 35–42.

[14] J. Lawry and H. He, “Multi-attribute decision making based on labelsemantics,” Int. J. Uncertainty, Fuzziness Knowl.-Based Syst., vol. 16,no. Supp. 2, pp. 69–86, 2008.

[15] J. Lawry, “An overview of computing with words using label semantics,”in Fuzzy Sets and Their Extensions: Regression, Aggregation and Models.H. Bustince, F. Herrera, and J. Montero, Eds. New York: Springer-Verlag,2008, pp. 85–87.

[16] J. Lawry and Y. Tang, “Uncertainty modelling for vague concepts: Aprototype theory approach,” Artif. Intell., vol. 173, pp. 1539–1558, 2009.

[17] D. R. McCulloch, J. Lawry, M. A. Rico-Ramirez, and I. D. Cluckie,“Classification of weather radar images using linguistic decision treeswith conditional labelling,” in Proc. IEEE Int. Conf. Fuzzy Syst., London,U.K., 2007.

[18] D. R. McCulloch, J. Lawry, and I. D. Cluckie, “Real-time flood forecastingusing updateable linguistic decision trees,” in Proc. IEEE Int. Conf. FuzzySyst., 2008, pp. 1935–1942.

[19] H. T. Nguyen, “On modeling of linguistic information using random sets,”Inf. Sci., vol. 34, pp. 265–274, 1984.

[20] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning About Data.Norwell, MA: Kluwer, 1991.

[21] Z. Pawlak, “Granularity of knowledge, indiscernibility and rough sets,” inProc. IEEE World Congr. Comput. Intell., 1998, vol. 1, pp. 106–110.

[22] W. Pedrycz, “Relations of granular worlds,” Int. J. Appl. Math. Comput.Sci., vol. 12, no. 3, pp. 347–357, 2002.

[23] Z. Qin and J. Lawry, “Decision tree learning with fuzzy labels,” Inf. Sci.,vol. 172, pp. 91–129, 2005.

[24] N. Randon, J. Lawry, D. Han, and I. D. Cluckie, “River flow modellingbased on fuzzy labels,” in Proc. Inf. Process. Manage. Uncertainty, Jul.2004.

[25] J. R. Quinlan, “Decision trees as probabilistic classifiers,” in Proc. 4thInt. Workshop Mach. Learn., San Mateo, CA: Morgan Kauffman, 1987,pp. 31–37.

[26] N. Randon and J. Lawry, “Classification and query evaluation using mod-elling with words,” Inf. Sci., vol. 176, pp. 438–464, 2006.

[27] N. Randon, J. Lawry, K. Horsburgh, and I. D. Cluckie, “Fuzzy bayesianmodelling of sea-level along the east coast of Britain,” IEEE Trans. FuzzySyst., vol. 16, no. 3, pp. 725–738, Jun. 2008.

[28] G. Shafer, A Mathematical Theory of Evidence. Princeton, NJ: PrincetonUniv. Press, 1976.

[29] P Smets, “Constructing the pignistic probability function in a context ofuncertainty,” in Uncertainty in Artificial Intelligence 5, M. Henrion, Ed.Amsterdam, The Netherlands: North-Holland, 1990, pp. 29–39.

[30] Y. Tang and J. Zheng, “Linguistic modelling based on semantic similarityrelation amongst linguistic labels,” Fuzzy Sets Syst., vol. 157, pp. 1662–1673, 2006.

[31] O. Turnbull, A. Richards, J. Lawry, and M. Lowenburg, “Fuzzy decisiontree cloning of flight trajectory optimisation for rapid path planning,” inProc. 45th IEEE Conf. Decis. Control, 2006, pp. 6361–6366.

[32] T. Williamson, Vagueness. Evanston, IL: Routledge, 1994.[33] L. A. Zadeh, “Fuzzy sets,” Inf. Control, vol. 8, no. 3, pp. 338–353, 1965.[34] L. A. Zadeh, “Toward a theory of fuzzy information granulation and its

centrality in human reasoning and fuzzy logic,” Fuzzy Sets Syst., vol. 90,pp. 111–127, 1997.

[35] L. A. Zadeh, “Some reflections on soft computing, granular comput-ing and their roles in the conception, design and utilization of informa-tion/intelligent systems,” Soft Comput., vol. 2, pp. 23–25, 1998.

Jonathan Lawry received the B.Sc. degree in mathe-matics from Plymouth Polytechnic, Plymouth, U.K.,in 1990 and the Ph.D. degree from the University ofManchester, Manchester, U.K., in 1995.

He is currently a Professor of artificial intelligencewith the University of Bristol, Bristol, U.K. He hasauthored or coauthored more than 90 refereed articlesin the area of approximate reasoning, as well as fiveedited volumes and one book. His research interestsinclude random set approaches to modeling vague-ness and linguistic uncertainty in complex systems

and, in particular, the label semantic framework.

Yongchuan Tang was born in Hubei, China, onDecember 5, 1974. He received the M.Sc. degree inapplied mathematics and the Ph.D. degree from theSouthwest Jiaotong University, Chengdu, China, in2000 and 2003, respectively.

He is currently an Associate Research Fellow withthe College of Computer Science, Zhejiang Univer-sity, Hangzhou, China. His research interests includethe mathematical representation of uncertainty, fuzzycomputing, affective computing, and the study of un-certainty in complex systems.

granular knowledge representation and inference using labels and label expressions

Documents