representing causality using fuzzy …mazlack/dbm.w2011/mazlack_nafips_2009.pdf · representing...

7
REPRESENTING CAUSALITY USING FUZZY COGNITIVE MAPS Lawrence J. Mazlack Applied Computational Intelligence Laboratory University of Cincinnati Cincinnati, Ohio 45221 ABSTRACT: Causal reasoning occupies a central position in human reasoning. In order to algorithmically consider causal relations, the relations must be placed into a representation that supports manipulation. The most widespread causal represen- tation is directed acyclic graphs (DAGs). However, DAGs are severely limited in what portion of the common sense world they can represent. This paper considers the needs of common- sense causality and suggests Fuzzy Cognitive Maps as an al- ternative to DAGs. In many ways, causality is granular. Com- monsense reasoning recognizes granularization and that ob- jects may be made up out of granules. Knowledge of at least some causal effects is imprecise. Perhaps, complete knowledge of all possible factors might lead to a crisp description of whether an effect will occur. However, in the commonsense world, it is unlikely that all possible factors can be known. Commonsense understanding of the world deals with impreci- sion, uncertainty and imperfect knowledge. In commonsense, every day reasoning, we use approaches that do not require complete knowledge. Even if the precise elements of the com- plex are unknown, people recognize that a complex collection of elements can cause a particular effect. They may not know what events are in the complex; or, what constraints and laws the complex is subject to. Sometimes, the details underlying an event can be known to a fine level of detail, sometimes not. An algorithmic way of handling causal imprecision is needed. KEYWORDS: causality, fuzzy, commonsense, complexes, im- precision, cognitive maps, DAGs 1. INTRODUCTION Causal reasoning occupies a central position in human rea- soning. It plays an essential role in human decision-making. Of particular interest to this paper are areas where the analysis is observational (non-experimental). The world is taken as it is and not subject to experimentation. Data mining is of particu- lar interest. Data mining analyzes data previously collected; it is non-ex- perimental. There are several different data mining products. The most common are association rules. Customers who buy beer and sausage also tend to buy hamburger with {confidence = 0.7} in {support = 0.2} Customers who buy strawberries also tend to buy whipped cream with {confidence = 0.8} in {support = 0.15} Figure 1. Association rules. At first glance, association rules seem to imply a causal or cause-effect relationship. That is: A customer’s purchase of both sausage and beer causes the customer to also buy hamburger. But, all that is discovered is the existence of a statistical rela- tionship between the items. They have a degree of joint occur- rence. The nature of the relationship is not specified. We do not know whether the presence of an item or sets of items causes the presence of another item or set of items; or the con- verse, or some other phenomenon causes them to occur together. Purely accidental relationships do not have the same deci- sion value, as do causal relationships. For example, IF it is true that buying both beer and sausage somehow causes someone to buy beer, • then: A merchant might profitably put beer (or the like- wise associated sausage) on sale • and at the same time: Increase the price of hamburger to compensate for the sale price. On the other hand, knowing that Bread and milk are often purchased together. may not be useful information as both products are commonly purchased on every store visit. When typically developed, association rules do not neces- sarily describe causality. The confidence measure is simply an estimate of conditional probability. Support indicates how often the joint occurrence happens (the joint probability over the entire data set). The joint occurrence count is symmetric; that is, it does not matter what we count first. Also, the strength of any causal dependency may be very different from that of a possibly related association value. In all cases confidence causal dependence All that can be said is that associations describe the strength of joint co-occurrences. Association rules can be used is to aid in making retail mar- keting decisions. However, simple association rules may lead to errors. Errors might occur; either if causality is recognized where there is no causality; or if the direction of the causal relationship is wrong [1] [2]. Errors might occur; either if cau- sality is recognized where there is no causality; or if the direc- tion of the causal relationship is wrong. For example, if A study of past customers shows that 94% are sick. • Is it the following rule? Our customers are sick, so they buy from us. • Is it the following complementary rule? If people use our products, they are likely to become sick. From a decision making viewpoint, it is not enough to know that People both buy our products and are sick. what is needed is knowledge of what causes what; if at all. People do things in the world by exploiting commonsense perceptions of cause and effect. Manipulating perceptions has been explored [3] but is not this paper’s focus. The interest here is how perceptions affect commonsense causal reasoning, granularity, and the need for precision. When trying to precisely reason about causality, complete knowledge of all of the relevant events and circumstances is needed. In commonsense, every day reasoning, approaches are used that do not require complete knowledge. Often, ap- proaches follow what is essentially a satisficing [4] paradigm. Commonsense understanding of the world tells us that we have to deal with imprecision, uncertainty and imperfect knowledge. A way of handling imprecision is needed to com-

Upload: trantuyen

Post on 28-Mar-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: REPRESENTING CAUSALITY USING FUZZY …mazlack/dbm.w2011/Mazlack_NAFIPS_2009.pdf · REPRESENTING CAUSALITY USING FUZZY COGNITIVE MAPS Lawrence J. Mazlack Applied Computational Intelligence

REPRESENTING CAUSALITY USING FUZZY COGNITIVE MAPS Lawrence J. Mazlack

Applied Computational Intelligence Laboratory University of Cincinnati Cincinnati, Ohio 45221

ABSTRACT: Causal reasoning occupies a central position in human reasoning. In order to algorithmically consider causal relations, the relations must be placed into a representation that supports manipulation. The most widespread causal represen-tation is directed acyclic graphs (DAGs). However, DAGs are severely limited in what portion of the common sense world they can represent. This paper considers the needs of common-sense causality and suggests Fuzzy Cognitive Maps as an al-ternative to DAGs. In many ways, causality is granular. Com-monsense reasoning recognizes granularization and that ob-jects may be made up out of granules. Knowledge of at least some causal effects is imprecise. Perhaps, complete knowledge of all possible factors might lead to a crisp description of whether an effect will occur. However, in the commonsense world, it is unlikely that all possible factors can be known. Commonsense understanding of the world deals with impreci-sion, uncertainty and imperfect knowledge. In commonsense, every day reasoning, we use approaches that do not require complete knowledge. Even if the precise elements of the com-plex are unknown, people recognize that a complex collection of elements can cause a particular effect. They may not know what events are in the complex; or, what constraints and laws the complex is subject to. Sometimes, the details underlying an event can be known to a fine level of detail, sometimes not. An algorithmic way of handling causal imprecision is needed. KEYWORDS: causality, fuzzy, commonsense, complexes, im-precision, cognitive maps, DAGs

1. INTRODUCTION Causal reasoning occupies a central position in human rea-

soning. It plays an essential role in human decision-making. Of particular interest to this paper are areas where the analysis is observational (non-experimental). The world is taken as it is and not subject to experimentation. Data mining is of particu-lar interest.

Data mining analyzes data previously collected; it is non-ex-perimental. There are several different data mining products. The most common are association rules.

Customers who buy beer and sausage

also tend to buy hamburger with {confidence = 0.7}

in {support = 0.2} Customers who buy strawberries

also tend to buy whipped cream with {confidence = 0.8}

in {support = 0.15} Figure 1. Association rules.

At first glance, association rules seem to imply a causal or

cause-effect relationship. That is: A customer’s purchase of both sausage and beer causes the customer to also buy hamburger.

But, all that is discovered is the existence of a statistical rela-tionship between the items. They have a degree of joint occur-

rence. The nature of the relationship is not specified. We do not know whether the presence of an item or sets of items causes the presence of another item or set of items; or the con-verse, or some other phenomenon causes them to occur together.

Purely accidental relationships do not have the same deci-sion value, as do causal relationships. For example,

IF it is true that buying both beer and sausage somehow causes someone to buy beer, • then: A merchant might profitably put beer (or the like-

wise associated sausage) on sale • and at the same time: Increase the price of hamburger

to compensate for the sale price. On the other hand, knowing that

Bread and milk are often purchased together. may not be useful information as both products are commonly purchased on every store visit.

When typically developed, association rules do not neces-sarily describe causality. The confidence measure is simply an estimate of conditional probability. Support indicates how often the joint occurrence happens (the joint probability over the entire data set). The joint occurrence count is symmetric; that is, it does not matter what we count first. Also, the strength of any causal dependency may be very different from that of a possibly related association value. In all cases

confidence ≥ causal dependence All that can be said is that associations describe the strength of joint co-occurrences.

Association rules can be used is to aid in making retail mar-keting decisions. However, simple association rules may lead to errors. Errors might occur; either if causality is recognized where there is no causality; or if the direction of the causal relationship is wrong [1] [2]. Errors might occur; either if cau-sality is recognized where there is no causality; or if the direc-tion of the causal relationship is wrong. For example, if

A study of past customers shows that 94% are sick. • Is it the following rule?

Our customers are sick, so they buy from us. • Is it the following complementary rule?

If people use our products, they are likely to become sick. From a decision making viewpoint, it is not enough to know that

People both buy our products and are sick. what is needed is knowledge of what causes what; if at all.

People do things in the world by exploiting commonsense perceptions of cause and effect. Manipulating perceptions has been explored [3] but is not this paper’s focus. The interest here is how perceptions affect commonsense causal reasoning, granularity, and the need for precision.

When trying to precisely reason about causality, complete knowledge of all of the relevant events and circumstances is needed. In commonsense, every day reasoning, approaches are used that do not require complete knowledge. Often, ap-proaches follow what is essentially a satisficing [4] paradigm.

Commonsense understanding of the world tells us that we have to deal with imprecision, uncertainty and imperfect knowledge. A way of handling imprecision is needed to com-

Page 2: REPRESENTING CAUSALITY USING FUZZY …mazlack/dbm.w2011/Mazlack_NAFIPS_2009.pdf · REPRESENTING CAUSALITY USING FUZZY COGNITIVE MAPS Lawrence J. Mazlack Applied Computational Intelligence

putationally handle causality. Models are needed to consider causes and effects; they must accommodate imprecision. These models may be symbolic or graphic. A difficulty is striking a good balance between precise formalism and commonsense imprecise reality.

2. CAUSAL COMPLEXES

In many ways, causality is granular. This is true for com-monsense reasoning as well as for more formal mathematical and scientific theory. At a very fine-grained level, the physical world itself may be granular. Our commonsense perception of causality is often large grained while the underlying causal structures may be described in a more fine-grained manner.

A comprehensive causal representation structure should be able to accommodate changes in grain size.

In applying commonsense causal reasoning, a complex col-lection of elements can be involved causally in a particular effect, even if the precise elements of the complex are un-known. It may not be known what events are in the complex; or, what constraints and laws the complex is subject to.

Sometimes, the details underlying an event are known to a fine level of detail, sometimes not. Events are rarely known to the finest possible grain size.

A comprehensive causal representation structure should be able to accommodate changes in grain size.

When events happen, there are usually other related events. The entire collection of events can be called a complex. A “mechanism” [5] or a “causal complex” [6] [7] is a collection of events whose occurrence or non-occurrence results in a con-sequent event happening. Hobbs suggests that human casual reasoning making use of a causal complex does not require precise, complete knowledge of the complex.

Each complex, taken as a whole, can be considered to be a granule if the grain size is increased. Larger complexes can be decomposed into smaller complexes; going from large grained to small grained. For example, when describing starting an automobile, A large-grained to small-grained, nested causal view would start with

When an automobile’s ignition switch is turned on, this causes the engine to start.

But, it would not happen if a large system of other nested con-ditions were not in place.

There has to be available fuel. The battery has to be good. The switch has to be connected to the battery so electricity can flow through it. The wiring has to connect the switch to the starter and ignition system (spark plugs, etc.). The en-gine has to be in good working order; and so forth.

Turning the ignition switch on is one action in a complex of conditions required to start the engine.

Sometimes, it is enough to know what happens at a large grained level; at other times it is necessary to know the fined grained result. For example, if

Bill believes that turning the ignition key of his automobile causes the automobile to start.

It is enough if Bill engages an automobile mechanic when his automobile does not start when he turns the key on. As the automobile mechanic knows a finer grained view of an automobile’s causal complex than does Robin.

Nested granularity may be applied to causal complexes. A complex may be several larger grained elements. In turn, each of the larger grained elements may be a complex of more fine-grained elements. In turn, these elements may be made up still finer grained elements. Any representation must be able to accommodate causal complexes and shifts between grain size.

3 DEFINING CAUSALITY Coming to a precise description of what is meant by causal-

ity is difficult. There are multiple and sometimes conflicting definitions. Zadeh [8] suggested that a precise, formal defini-tion may not be possible. Regardless, we have a commonsense belief that there are causal relationships. Satisfactorily and explicitly specifying them is difficult. This paper approaches the issues from a commonsense view. For a discussion of vari-ous aspects of causality definition, see Mazlack [9].

Perhaps, complete knowledge of all possible factors might lead to a crisp description of whether an effect will occur. However, it is unlikely that all possible factors can or will be known.

3.1 Positive Causation Commonsense understanding of causation focuses on posi-

tive causation. Causal relationships exist in the commonsense world; for example,

When a glass is pushed off a table and breaks on the floor it might be said that

Being pushed from the table caused the glass to break. Although,

Being pushed from a table is not a certain cause of break-age; sometimes the glass bounces and no break occurs; someone catches the glass before it hits the floor.

Counterfactually, usually (but not always), Not falling to the floor prevents breakage.

Sometimes, A glass breaks when an errant object hits it, even though it does not fall from the table.

Positive causal relationships can be described as: if α then β (or, α → β). For example:

When an automobile driver fails to stop at a red light and there is an accident, it can be said that the failure to stop was the accident’s cause.

3.2 Negation Simple negation does not work; both because an effect can

be overdetermined and because negative statements are weaker than positive statements as the negative statements can become overextended. It cannot be said that ¬α → ¬β, for example:

Failing to not stop at a red light is not a certain cause of an accident occurring; sometimes an accident still does not occurs: Nothing hits the non-stopped car as it goes through the intersection; and, the non-stopped car hits nothing itself.

Similarly, Stopping at a red light is not a certain cause of an acci-dent not occurring; sometimes an accident still occurs: Another car can hit the stopped car because the other car’s brakes failed.

Negation or counterfactuals in causal reasoning (¬α → ¬β) has a place; although it may result in errors in reasoning. For ex-ample, the rule:

If a person drinks wine, they may become inebriated. cannot be simply negated to

If a person does not drink wine, they will not become ine-briated.

The effects can be overdetermined; that is: more than one item can cause an effect. If so, eliminating one cause does not nec-essarily eliminate the effect. For example:

A person may drink beer or whiskey to excess and be-come inebriated.

Page 3: REPRESENTING CAUSALITY USING FUZZY …mazlack/dbm.w2011/Mazlack_NAFIPS_2009.pdf · REPRESENTING CAUSALITY USING FUZZY COGNITIVE MAPS Lawrence J. Mazlack Applied Computational Intelligence

Figure 2. Overdetermined effect (inebriation) Events that do not happen can similarly be overdetermined. For example, Ortiz [10] states that it is not true that

His closing the barn door caused the horse not to escape. because the horse might not have attempted to escape even if the door was open. Therefore, a false counterfactual is:

If he had not closed the barn door, the horse would have escaped.

Similarly, the rule: If a person smokes, they will get cancer.

cannot be simply negated to If a person does not smoke, they will not get cancer.

Again, effects can be overdetermined. In this case, People who do not smoke may still get cancer from other causes.

Other ideas that are sometimes involved in causal reasoning are causal uncorrelatedness [11] where if two variables have no common cause they are causally uncorrelated. Similarly, Dawid [12] focuses on the negative; i.e., when α does not af-fect β. Dawid speaks in terms of unresponsiveness and insen-sitivity. If β is unresponsive to α if whatever the value of α might be set to, the value of β will be unchanged. In parallel, if β is insensitive to α if whatever the value α may be set, the uncertainty about β will be unaffected.

Some describe events in terms of enablement and use coun-terfactual implication whose negation is implicit; for example [10]:

Not picking up the ticket enabled him to miss the train. Along the same vein, Shoham [13] [14] distinguishes be-

tween causing, enabling, and preventing. The enabling factor is often considered to be a causal factor. Shoham distinguished between background (enabling) conditions and foreground conditions. The background (enabling) conditions are inferred by default. For example [14]:

“If information is present that the key was turned and nothing is mentioned about the stated about the state of the battery, then it is inferred that the motor will start, be-cause the battery is assumed, by default to be alive.”

A similar idea that is sometimes involved in causal reasoning are causal uncorrelatedness [11] where if two variables share no common cause they are causally uncorrelated. This occurs if there are no single events that cause them to both change.

3.3 Inherently Uncertain Recognition Recognizing many things with absolute certainty is prob-

lematic. As this is the case, our causal understanding is based on a foundation of inherent uncertainty and incompleteness. Consequently, causal reasoning models must accommodate inherent ambiguity. For a more detailed discussion of inher-ently uncertain recognition, see [15].

It may well be that a precise and complete knowledge of causal events is not possible or at least uncertain. On the other hand, we have a commonsense belief that causal effects exist in the real world. If we can develop models tolerant of imprecision, it would be useful.

4 GRAPHS AND CAUSALITY

Different aspects of causality have been examined. The idea of “positive” causation (α → β) is at the core of commonsense causal reasoning. Often a positive causal relationship is repre-sented as a network of nodes and branches [15].

!"

Figure 3. Diagram indicating that α is causally dependent on β.

4.1 Directed Acyclic Graphs (DAGs)

Various graph based Bayesian based methods have been suggested to describe causality. Probably the best known is the class of methods based on Directed Acyclic Graphs (DAGs). The most fully developed approach is Pearl [16]. Silverstein [17] [18] followed a similar approach.

Pearl [19] and Spirtes [20] claim that it is possible to infer causal relationships between two variables from associations found in observational (nonexperimental) data without sub-stantial domain knowledge. Spirtes claims that directed acyclic graphs could be used if (a) the sample size is large and (b) the distribution of random values is faithful to the causal graph. Robins [21] argues that their argument is incorrect. Lastly, Scheines [22] claims that only in some situations will it be possible to determine causality. Their discussion is tangential to the focus of this paper; going deeply into their discussion is outside this paper’s scope. It is enough to note that these meth-ods are possibly the most thoroughly developed methods of computational causal analysis.

From the commonsense causal reasoning view, the various directed graph methods have similar liabilities. They are dis-cussed in the following part of this section.

4.1.1 Liability: Discrete or continuous data must be reduced to Boolean values

This is an early technique that was and is used in data min-ing when analyzing market basket data. However, it is essen-tially flawed. Quantities do matter; some data co-occurrences are conditioned on there being a sufficiency of a co-occurring attribute. Also, some relationships may be non-linear based on quantity [2]. 4.1.2 Liability: There can be no missing data

This is at variance with day-to-day experience. There is al-most always missing data of some sort. Data collection is rarely fully representative and complete. Incremental data is often acquired that is at variance with previously acquired data. What is needed is a methodology that is not brittle in the face of incompleteness.

4.1.3 Liability: Causal relationships are not cyclic, either di-rectly or indirectly (through another attribute).

This is at variance with our commonsense understanding of the world. Within cyclic dependencies, there are several variants.

4.1.3.1 Cycles with time lag

There are many commonsense examples where cycles are needed.

Figure 4. Positive feedback cycle: I tell Jane that I love her.

Then, she tells me that she loves me. Then, I tell Jane

Page 4: REPRESENTING CAUSALITY USING FUZZY …mazlack/dbm.w2011/Mazlack_NAFIPS_2009.pdf · REPRESENTING CAUSALITY USING FUZZY COGNITIVE MAPS Lawrence J. Mazlack Applied Computational Intelligence

that I love her more than before. Then, she … and so forth and so forth. Clearly, the cyclic reinforcement would be substantial.

Some cycles can be reasonably collapsed, some cannot be.

Figure 5. Cyclic relationship that can be easily collapsed

Figure 6. Cyclic relationship that cannot be collapsed 4.1.3.2 Concurrent cycles

Another form of a cycle is joint mutual dependency. It is

possible that there might be mutual dependencies [15]; i.e., α → β as well as β → a. It seems to be possible that they do so with different strengths. They can be described as shown in the following figure where Si,j represents the strength of the causal relationship from i to j . It would seem that the strengths would be best represented by an approximate belief function, either quantitatively or verbally.

!

!,"S

S",!

"

Figure 7. Cyclic relationship: Mutual unequal dependency. [15]

There would appear to be two variations: differing causal strengths for the same activity; and, different causal strengths for symmetric activities occurring at different times.

4.1.3.2.1 Different causal strengths for the same activity, occurring at the same time

Some argue that causality should be completely asymmetric and if it appears that items have mutual influences it is because there is another cause that causes both. A problem with this is that it can lead to eventual regression to a first cause. Whether this is true or not, it is not useful for commonsense representa-tion. In contrast, Simon [5] and Shoham [14] identify cases where causality is simultaneous.

It is also our commonsense experience. For example, in the preceding figure, α could be short men and β could be tall women. If Sα,β meant the strength of desire for a social meeting that was caused in short men by the sight of tall women, it might be that Sα,β > Sβ,α .

4.1.3.2.2 Different causal strengths for symmetric activities, occurring at different times

It would seem that if there were causal relationships in mar-ket basket data, there would often be imbalanced dependen-cies. For example, if

A customer first buys strawberries, there may be a rea-sonably good chance that they will then buy whipped cream.

Conversely, if They first buys whipped cream, the subsequent purchase of strawberries may be less likely.

4.1.4 Liability: Markov Stationary Condition holds: Probabili-ties are time independent

This does not correspond to our commonsense understand-ing of the world. If one event is dependent on two other causal events, if one causal event happens much earlier (or later) than the other causal event, there may well be a different result.

Figure 8. Case where differing times in causal events affects

probability of causal result.

4.1.5 Liability: The Markov Condition holds: Memoryless States

The Markov Condition is defined as: Let A be a node in a causal Bayesian network, and let B be any node that is not a descendant of A in the network. Then the Markov (Markoff) condition holds if A and B are independent, conditioned on the parents of A. The intuition of this condition is: If A and B are dependent, then B must either be (a possibly indirect) cause of A or (possibly indirectly) caused by A. In the second case, B is a descendant of A, while in the first B is an ancestor of A and has no effect on A once A’s immediate parents are fixed.

This makes sense in the example shown in the following figure.

Figure 9. “Memoryless” Markov condition holds

However, not all of our commonsense perceptions of causal-ity work this way. Often, we believe that history matters as in the example shown in the following figure.

Page 5: REPRESENTING CAUSALITY USING FUZZY …mazlack/dbm.w2011/Mazlack_NAFIPS_2009.pdf · REPRESENTING CAUSALITY USING FUZZY COGNITIVE MAPS Lawrence J. Mazlack Applied Computational Intelligence

Figure 10. Causality where memory play a part.

4.2 Fuzzy Cognitive Maps

There are several needs of a causal model. Many cannot be met by a directed graph. The needs are:

• Represent imprecision • Accommodate changes in grain size • Describe complexes • Support cycle models of all times • Be time varying • Not be restricted by Markov conditions A potentially useful form for representing causal structures

that overcomes some of the deficiencies of directed acyclic graphs or similar representations is Fuzzy Cognitive Maps (FCMs). However, they bring their own liabilities in that autonomously forming FCMs through data mining is unresolved.

Fuzzy Cognitive Maps are fuzzy structures that resemble neural networks. They have powerful and far-reaching conse-quences as a mathematical tool for modeling complex systems. At first, Axelord [23] used cognitive maps as a formal way of representing social scientific knowledge and modeling deci-sion making in social and political systems [24]. Then, Kosko enhanced cognitive maps considering fuzzy values for them [25] [26] [27] [28].

Indeterminacy and unpredictability affect causal descrip-tions almost as much as the determined factors. It is a major handicap in mathematical modeling if we are only able to give precise values for known concepts and relationships between concepts, thereby presenting onto ourselves a skewed view. Fuzzy theory can offer more choices. Mendel [29] and Kan-dasamy [30] have introducing indeterminacy into Fuzzy the-ory; Mendel uses Type-2 Fuzzy Logic and Kandasamy uses Neutrosophic structures.

A FCM describes the behavior of a system in terms of con-cepts; each concept represents a state or a characteristic of the system. A FCM illustrates the system by a graph showing the cause and effect along concepts. FCMs model the world as a collection of classes and causal relations between classes. FCMs are fuzzy signed directed graphs with feedback. The directed edge Wij from causal concept Ci to concept Cj meas-ures how much Ci causes Cj. The time varying concept func-tion Ci(t) measures the non-negative occurrence of some fuzzy event.

Causal feedback loops can occur in FCMs. FCM feedback allows causal adaptation. FCM feedback forces the abandon-ment of classic graph search, forward and especially backward chaining. Instead the FCM is viewed as a dynamical system and take its equilibrium behavior as a forward-evolved infer-ence. A FCM has the ability to specify a complex model, shows linear or non-linear relations and allows causal propagation [31].

FCMs can be used to reason. State vectors C are repeatedly passed through the FCM connection matrix W, thresholding or

non-linearly transforming the result after each pass. The limit cycle or fixed-point inference summarizes the joint effects of all the interacting fuzzy knowledge.

A Fuzzy Cognitive Map consists of nodes that represent concepts and arcs (edges) between concepts [32]. Each concept represents a characteristic of the system; in general it stands for events, actions, goals, values, trends of the system that is modeled as an FCM. Each concept is characterized by a num-ber Ai that represents its value and it results from the transfor-mation of the value of the system’s variable, for which this concept stands, in the interval [-1,1]. The graph’s edges are the casual influences between the concepts.

Between concepts, there are three possible types of causal relationships that express the type of influence from one con-cept to the others. The edges Wij takes values in the fuzzy causal interval [–1, 1]. The weights of the arcs between con-cept Ci and concept Cj could be positive (Wij > 0) which means that an increase in the value of concept Ci leads to the increase of the value of concept Cj , and a decrease in the value of concept Ci leads to the decrease of the value of concept Cj . Or there is negative causality (Wij < 0) which means that an increase in the value of concept Ci leads to the decrease of the value of concept Cj and vice versa. Wij = 0 indicates no causality.

Figure 11. A simple Fuzzy Cognitive Map

Figure 12. A urinary bladder tumor grading model [33]

The previous figure shows how a medical FCM application; where Cn, n=1…8 represent histological features using five positive linguistic variables depending on the characteristics of each particular concept. C9 represents the target, tumor grade.

The value of a node reflects the degree to which the concept is active in the system at a particular time. This value is a function of the sum of all incoming edges multiplied and the value of the originating concept at the immediately preceding state. The threshold function applied to the weighted sums can be fuzzy in nature. Concept values are expressed on a normal-ized range denoting a degree of activation rather than an exact quantitative value. The threshold function serves to reduce

Page 6: REPRESENTING CAUSALITY USING FUZZY …mazlack/dbm.w2011/Mazlack_NAFIPS_2009.pdf · REPRESENTING CAUSALITY USING FUZZY COGNITIVE MAPS Lawrence J. Mazlack Applied Computational Intelligence

unbounded inputs to a strict range; this eliminates the possibil-ity of quantitative results, but it gives us a basis for comparing nodes – on or off, active or inactive. This mapping is a varia-tion of the “fuzzification” process.

Beyond the graphical representation of the FCM there is its mathematical model. It consists of

!

1" n state vector A that includes the values of the n concepts and a

!

n " n weight ma-trix W that gathers the weights Wij of the interconnections between the n concepts of the FCM. The matrix W has n rows and n columns where n equals the total number of distinct con-cepts of the FCM and the matrix diagonal is zero since it is assumed that no concept causes itself.

The value of each one concept is influenced by the values of the connected weighted concepts and by its previous value. The value Ai for each concept Ci is calculated by:

!

iA f jA jiWj=1j"1

n

#

$

%

& & &

'

(

) ) ) + i

oldA (1)

where Ai is the activation level of concept Ci at time t+1, Aj is the activation level of concept Cj at time t+1, Aj is the activa-tion level of concept Ci at time t, and Wji is the weight of the interconnection between Cj and Ci, and f is a threshold function.

!

newA = f oldA oW( ) + oldA (2) So the new state vector Anew is computed by multiplying the previous state vector Aold by the weight matrix W. The new vector shows the effect of the change in the value of one con-cept in the entire Fuzzy Cognitive Map. But, the previous equation also includes also, the old value of each concept, and so the FCM possesses memory capabilities and there is a smooth change after each new cycling of the FCM.

In general, a FCM functions like associative neural net-works. A FCM describes a system in a one-layer network that is used in unsupervised mode, whose neurons are assigned concept meanings and the interconnection weights represent relationships between these concepts. FCMs are often com-prised of concepts that can be represented as fuzzy sets and the causal relations between the concepts can be fuzzy implica-tions, conditional probabilities, etc. A directed edge Wij from concept Ci to concept Cj measures how much Ci causes Cj . In general, the edge weights Wij can take values in the fuzzy causal interval [-1,1] allowing degrees of causality to be represented: - Wjk > 0 indicates direct (positive) causality between concepts

Cj and Ck. That is, the increase (decrease) in the value of Cj leads to the increase (decrease) on the value of Ck.

- Wjk < 0 indicates inverse (negative) causality between con-cepts Cj and Ck. That is, the increase (decrease) in the value of Cj leads to the decrease (increase) on the value of Ck.

- Wjk = 0 indicates no relationship between Cj and Ck. In a FCM, clamping variables and using an iterative vector-

matrix multiplication procedure to assess the effects of these perturbations on the state of a model reveal model implica-tions. A model implication converges to a global stability, an equilibrium in the state of the system. During the inference process, the sequence of patterns reveals the inference model. A FCM that consists of n concepts, is represented mathemati-cally by a n state vector A, which gathers the values of the n concepts, and by a

!

n " n weighted matrix W. Each element Wij of the matrix indicates the value of the weight between con-cepts Ci and Cj . The activation level Ai for each concept Ci is calculated by:

!

iA f jA jiWj=1j"1

n

#

$

%

& & &

'

(

) ) ) + i

oldA (3)

!

jnew

A is the activation level of concept Ci at time t + 1 (or iteration k),

!

jold

A is the activation level of concept

!

jC at time t (or iteration k-1), and f is a threshold function. The new state vector A, which is computed by multiplying the previous state vector A by the edge matrix W, shows the effect of the change in the activation level of one concept on the other concepts

There are several unsupervised learning algorithms that might be used to reach convergence, or at least convergence within an acceptable tolerance [34] [35] [36] [37] [38] [39] [40]; the algorithms can be comparatively analyzed [41]; how-ever, neither particular methods or a comparative analysis of the methods are within the scope of this paper. The FCM system model takes the initial values of the concepts from the real system and is free to interact. The interaction happens because of change in the value of one or more concepts [42]. The interaction continues until the model: • Reaches equilibrium at a fixed point, with the output con-

cept values stabilizing. • Exhibits limit cycle behavior, with the concept values fal-

ling in a loop. • Exhibits chaotic behavior in a non-deterministic, random

way. The simplest FCMs act as asymmetrical networks of

threshold or continuous concepts and converge to an equilib-rium point or limit cycles. FCMs have nonlinear structure of their concepts and differ in their global feedback dynamics [43] [28]. The development of a FCM often occurs within a group con-text. Experts can pool or fuse their knowledge by adding the underlying FCM causal matrices [34]. The obvious advantage is that a causal relationship can be described in stages. If some form of data mining discovers the causal relationships, it could be a highly desirable incremental method. Another claimed advantage [24] is that combining incomplete, conflicting opin-ions of different experts may cancel out the effect of oversight, ignorance and prejudice; although this would seem to require a fairly large number of overlapping relationships so a useful average relationship would be developed. Ndousse [44] pro-vides a simple illustrative example of FCM merging.

5 EPILOGUE Causal reasoning occupies a central position in human rea-

soning. In order to algorithmically consider causal relations, the relations must be placed into a representation that supports manipulation. The most widespread causal representation is directed acyclic graphs (DAGs). However, DAGs are severely limited in what portion of the common sense world they can represent. This paper considers the needs of commonsense causality and suggests Fuzzy Cognitive Maps as an alternative to DAGs. In many ways, causality is granular. Commonsense reasoning recognizes granularization and that objects may be made up out of granules. Knowledge of at least some causal effects is imprecise. Perhaps, complete knowledge of all possi-ble factors might lead to a crisp description of whether an ef-fect will occur. However, in the commonsense world, it is un-likely that all possible factors can be known. Commonsense understanding of the world deals with imprecision, uncertainty and imperfect knowledge. In commonsense, every day rea-soning, we use approaches that do not require complete knowl-edge. Even if the precise elements of the complex are un-known, people recognize that a complex collection of elements can cause a particular effect. They may not know what events

Page 7: REPRESENTING CAUSALITY USING FUZZY …mazlack/dbm.w2011/Mazlack_NAFIPS_2009.pdf · REPRESENTING CAUSALITY USING FUZZY COGNITIVE MAPS Lawrence J. Mazlack Applied Computational Intelligence

are in the complex; or, what constraints and laws the complex is subject to. Sometimes, the details underlying an event can be known to a fine level of detail, sometimes not. An algorithmic way of handling causal imprecision is needed.

There are several needs of a causal model. Many cannot be met by a directed graph. The needs are:

• Represent imprecision • Accommodate changes in grain size • Describe complexes • Support cycle models of all times • Be time varying • Not be restricted by Markov conditions Fuzzy Cognitive Maps appear to be a good first approxima-

tion of complex casual complexes. They appear to be as com-prehensive representation. Their primary difficulty lies with establishing their initial values. REFERENCES 1. Brin, S., R. Motwani, and C. Silverstein. Beyond Market Baskets:

Generalizing Association Rules To Correlations. in ACM SIGMOD Inter-national Conference, SIGMOD'97. 1997. SIGMOD Record: ACM Special Interest Group On Management Of Data.

2. Mazlack, L.J. Causality Recognition For Data Mining In An Inherently Ill Defined World. in 2003 BISC FLINT-CIBI International Joint Workshop On Soft Computing For Internet And Bioinformatics. 2003. Berkeley, California.

3. Zadeh, L.A., From Computing With Numbers To Computing With Words - From Manipulation Of Measurements To Manipulation Of Perceptions. IEEE Transactions on Circuits and Systems, 1999. 45(1): p. 108-119.

4. Simon, H.A., A Behavior Model Of Rational Choice. Quarterly Journal of Economics, 1955. 59: p. 99-118.

5. Simon, H.A., Nonmonotonic Reasoning and Causation: Comment. Cogni-tive Science, 1991. 15: p. 293-300.

6. Hobbs, J.R. Causality. in Common Sense 2001, Fifth Symposium on Logical Formalizations of Commonsense Reasoning. 2001. New York University, New York.

7. Hobbs, J.R., Causality And Modality: The Case Of ‘Would’. Journal of Semantics, 2003.

8. Zadeh, L.A. Causality Is Undefinable. in Abstract Of A Lecture Presented At The BISC Seminar, University of California, Berkeley, reported to: Fuzzy Distribution List, [email protected], January 16, 2001 (on line only), also available at: http://www.cs.berkeley.edu/~nikraves/zadeh/Zadeh2.doc. 2001.

9. Mazlack, L.J. Discouvering Mined Granular Causal Complexes. in IEEE International Conference on Data Mining (ICDM). 2004. Brighton, United Kingdom.

10. Ortiz, C.L., A Commonsense Language For Reasoning About Causation And Rational Action. Artificial Intelligence, 1999. 108(1-2): p. 125-178.

11. Shafer, G., Causal Conjecture, in Causal Models and Intelligent Data Management, A. Gammerman, Editor. 1999, Springer-Verlag: Berlin.

12. Dawid, A., Who Needs Counterfactuals? in Causal Models and Intelligent Data Management, A. Gammerman, Editor. 1999, Springer-Verlag: Ber-lin.

13. Shoham, Y., Nonmonotonic Reasoning And Causality. Cognitive Science, 1990. 14: p. 213–252.

14. Shoham, Y., Remarks on Simon's Comments. Cognitive Science, 1991. 15: p. 301-303.

15. Mazlack, L.J. Commonsense Causal Modeling In The Data Mining Context. in IEEE International Conference on Data Mining series (ICDM). 2003. Melbourne, Florida.

16. Pearl, J., Causality. 2000, New York, NY: Cambridge University Press. 17. Silverstein, C., S. Brin, and S. Mani, Beyond Market Baskets:

Generalizing Association Rules To Dependence Rules. Data Mining And Knowledge Discovery, 1998. 2(1): p. 39-68.

18. Silverstein, C., et al. Scalable Techniques For Mining Causal Structures. in International Conference on Very Large Databases. 1998. New York, NY: Kluwer Academic Publishers.

19. Pearl, J. and T. Verma. A Theory Of Inferred Causation. in Principles Of Knowledge Representation And Reasoning: Proceedings Of The Second International Conference. 1991: Morgan Kaufmann.

20. Spirtes, P., C. Glymour, and R. Scheines, Causation, Prediction, and Search. 1993, New York: Springer-Verlag.

21. Robins, R. and L. Wasserman, On The Impossibility Of Inferring Causa-tion From Association Without Background Knowledge, in Computation, Causation, and Discovery, C. Glymour and G.F. Cooper, Editors. 1999, AAAI Press/MIT Press: Menlo Park. p. 305-321.

22. Scheines, R., et al., Tetrad II: Tools For Causal Modeling. 1994, Hillsdale, NJ: Lawrence Erlbaum.

23. Axelord, R., Structure Of Decision: The Cognitive Maps Of Political Elites. 1976, Princeton, New Jersey: Princeton University Press.

24. Aguilar, J., A Survey About Fuzzy Cognitive Maps Papers. International Journal Of Computational Cognition, 2005. 3(2): p. 27-33.

25. Kosko, B., Fuzzy Cognitive Maps. Int. J. Man-Machine Studies, 1986. 24: p. 65-75.

26. Kosko, B., Neural Nets And Fuzzy Systems, A Dynamic System Approach To Machine Intelligence. 1992, New Jersey: Prentice Hall.

27. Kosko, B. and J. Dickerson, Fuzzy Virtual Worlds. AI Expert, 1994: p. 25-31.

28. Kosko, B., Fuzzy Engineering. 1997, New Jersey: Prentice Hall. 29. Mendel, J., Uncertain Rule-Based Fuzzy Logic Systems: Introduction and

New Directions. 2000, New Jersey: Prentice Hall. 30. Kandasamy, W.B.V. and F. Smardandache, Fuzzy Cognitive Maps And

Neutrosophic Cognitive Maps. 2003, Phoenix: Xiquan. 211. 31. Stylios, C.D. and P.P. Groumpos, The Challenge Of Modeling Supervisory

Systems Using Fuzzy Cognitive Maps. Journal of Intelligent Manufactur-ing, 1998. 9(4): p. 339-345.

32. Stylios, C.D., V. Georgopoulos, and P.P. Groumpos. The Use Of Fuzzy Cognitive Maps In Modeling Systems. in The 5th IEEE Mediterranean Conference on Control and Systems (MED5). 1997. Paphos, Cyprus.

33. Papageorgiou, E.I., et al. A Fuzzy Cognitive Map Model For Grading Urinary Bladder Tumors. in 5th Int. Conf. in Neural Networks & Experts Systems in Medicine & Healthcare 1st Int. Conf. in Computational Intelli-gence in Medicine & Healthcare, NNESMED/CIMED Conference. 2003.

34. Taber, R., R.R. Yager, and C.M. Helgason, Quantization Effects on the Equilibrium Behavior of Combined Fuzzy Cognitive Maps. International Journal of Intelligent Systems, 2007. 22: p. 181-202.

35. Papageorgiou, E.I., C. Stylios, and P.P. Groumpos, An Activation Hebbian Learning Algorithm To Train Fuzzy Cognitive Maps. International Journal of Approximate Reasoning, 2004. 37(3): p. 219-249.

36. Cai, Y., et al. Context Modeling With Evolutionary Fuzzy Cognitive Map In Interactive Storytelling. in Fuzz-IEEE. 2008. Hong Kong, China: IEEE.

37. Konar, A. and U.K. Chakraborty, Reasoning And Unsupervised Learning In A Fuzzy Cognitive Map. Information Sciences, 2005. 170(2-4): p. 419-441.

38. Li, S.-J. and R.-M. Shen. Fuzzy Cognitive Map Learning Based On Im-proved Nonlinear Hebbian Rule. in Proceedings Of International Confer-ence On Machine Learning And Cybernetics 4. 2004.

39. Papageorgiou, E.I. and P.P. Groumpos, A Weight Adaptation Method For Fuzzy Cognitive Maps To A Process Control Problem, in Lecture Notes In Computer Science. 2004, Springer: Berlin.

40. Stach, W., et al., Genetic Learning Of Fuzzy Cognitive Maps. Fuzzy Sets And Systems, 2005. 143(3): p. 371-401.

41. Tsadiras, A.K., Comparing The Inference Capabilities Of Binary, Triva-lent And Sigmoid Fuzzy Cognitive Maps. Information Sciences, 2008. 178(20): p. 3880-3894.

42. Papageorgiou, E.I., C. Stylios, and P.P. Groumpos, Unsupervised Learn-ing Techniques For Fine-Tuning Fuzzy Cognitive Map Causal Links. Hu-man-Computer Studies, 2006. 64: p. 727-743.

43. Kosko, B., Fuzzy Associative Memory Systems, in Fuzzy Expert Systems, A. Kandel, Editor. 1992: Boca Raton, Florida. p. 135-162.

44. Ndousse, T.D. and T. Okuda. Computational Intelligence for Distributed Fault Management In Networks Using Fuzzy Cognitive Maps. in 1996 IEEE International Conference on Tomorrows Applications. 1996.