[ieee 2006 5th ieee international conference on cognitive informatics - beijing, china...

User-centered Interactive Data Mining

Yan Zhao, Yaohua Chen and Yiyu YaoDepartment of Computer Science, University ofRegina

Regina, Saskatchewan, Canada S4S OA2{yanzhao, chen ISy, yyao} @ cs. uregina. ca

Abstract original approach fails, participate in unanticipated emer-gencies and novel situations, and develop innovations in or-

While many data mining models concentrate on automa- der to preserve safety, avoid expensive failure, or increasetion and efficiency, interactive data mining models focus product quality [11, 16, 29].on adaptive and effective communications between human According to the above observations, we believe thatusers and computer systems. User views, preferences, interactive systems are required for data mining tasks.strategies andjudgements play the most important roles in Though human-machine interaction has been emphasizedhuman-machine interactivities, guide the selection of tar- for many disciplines, it did not get enough attention in theget knowledge representations, operations, and measure- domain of data mining until recently [3, 15, 50]. Gener-ments. Practically, user views, preferences andjudgements ally, an interactive data mining system is an integrationalso decide strategies of abnormal situation handling, and of a human user and a computer. They can communicateexplanations of mined patterns. In this paper; we discuss and exchange information and knowledge. A foundationthese fundamental issues. of human-computer interaction may be provided by cogni-

tive informatics [36, 37, 38].Through interaction and communication, computers and

1. Introduction users can divide the labors in order to achieve a good bal-ance of automation and human control. Computers are

Exploring and extracting knowledge from data is one of used to retrieve and keep track of large volumes of data,the fundamental problems in science. Many methods have and to carry out complex mathematical or logical opera-been proposed and extensively studied, such as database tions. Users can avoid routine, tedious, and error-pronemanagement, statistics, machine learning, etc. Particularly, tasks, concentrate on critical decisions, planning, and copedata mining takes up many important tasks, such as de- with unexpected situations [11, 29]. Moreover, interactivescription, prediction and explanation of data. data mining can encourage learning, improve insights and

Data mining is featured by applying computer technolo- understandings of the domain, stimulate the explorationgies to carry out nontrivial calculations. Computer systems of creative possibilities, and help users to solve particularcan maintain precise operations under heavy information problems. Users' feedback can be used to improve the sys-load, and maintain steady performance. Without the aid of tem. The interaction is bi-beneficial.computer systems, it is very difficult for people to aware, In this paper, we discuss some of the fundamental issuesextract, memorize, search and retrieve knowledge in large of user-centered interactive data mining. It is important toand separate datasets, to interpret and evaluate data and note that users possess various skills, intelligence, cogni-information that are constantly changing, to make recom- tive styles, frustration tolerances and other mental abili-mendations or predictions in the face of inconsistent and ties. They come to a problem with various preferences,incomplete data. requirements and background knowledge. Given a set of

It is true that computer technologies have freed humans data, every user may try to make sense out of data by see-from many time-consuming and labor-intensive activities. ing it from different angles, in different aspects, and underHowever, full automation of cognitive functions such as de- different views. Based on these differences, data miningcision making, planning, and creative thinking remains hu- methods and results are not unique. There does not exist aman's job. Implementations and applications of computer universally applicable theory or method to serve the needssystems reflect requests and preferences of human users, of all users. This motivates and justifies the co-existenceand contain certain human heuristics. Computer systems of many theories and methods for data mining systems, asmust rely on human users to set goals, select alternatives if well as the exploration of new theories and methods.

Proc. 5th IEEE Int. Conf. on Cognitive Informatics (ICCI'06)Y.Y. Yao, Z.Z. Shi, Y. Wang, and W. Kinsner (Eds.)1-4244-0475-4/06/$20.OO @)2006 IEEE45

In Section 2, we talk about multiple views of data min- 2.1. Information tables and the logic languageing and knowledge discovery. We exemplify various mea-sures that associate and evaluate multiple views. Interactive An information table represents all available informa-data mining systems should support and encourage mul- tion. Knowledge or rules can be discovered based on infor-tiple views. It is often meaningless to argue which view mation tables. The rows of an information table representis better, more suitable or more appropriate by isolating it the objects. The columns describe a set of attributes. Anfrom user requirements and applications. information table can be formally defined by a quadruple:

While a specific view is targeted, some standards and S = (U At {Va a C At} {Ia a C At})efficient approaches can be used or implemented. At thisstage, user-centered interaction may be characterized as a where U is a finite nonempty set of objects, At is a finiteuser preference. A user may prefer one solution to another, nonempty set of attributes, Va is a nonempty set of valuesone arrangement to another, one attribute order to another, for a C At, and Ia is an information function Ia U -* Va foror, one result to another. In Section 3, we present a for- a C At.mal model of user preference. Based on the model, a user With respect to the definition of information table, apreference is represented as a weak order. logic language can be defined to express various types of

Having a specific view being targeted and a user pref- rules. We adopt the decision logic language studied byerence being provided, an interactive data mining system Pawlak [25]. Similar languages have been studied by manycan carry out calculations and inferences. However, in authors [8, 41]. In this language, an atomic formula is amany real applications, ideal situations do not exist. In- pair (a v), where a C At and v C Va. If 0 and jf are for-stead, users need to choose a strategy for abnormal situ- mulas, then so are -i0 0 A fu 0 V fu 0 -> u, and '-<-*ation handling. In Section 4, we discuss several different by applying the logic connectives -, A, V, ->, and <-*. Forstrategies, each represents a particular type of user require- an atomic formula 0 = (a v), it is assumed that an object xments. Some of the abnormal situation handlers have been satisfies 0 or does not satisfy 0. In other words, if an objectembedded into specific algorithms. We examine how these x has a value v on the attribute a, then we say that the objecthandlers are different from each other. x satisfies 0. Otherwise, we say x does not satisfy 0. The

In the existing data mining frameworks, the phase of in- set of objects that satisfy a formula 0 is denoted as m(0).terpretations and explanations of the discovered knowledge For an atomic formula is a pair (a v), the set of objectsis often missing. In Section 5, we talk about how to explain is m(a v) = {x C U Ia(x) v}. The following propertiesa discovered pattern, in order to relate it more closely to a hold:user. (i) m(-b) -m(0)

As a whole, multiple views, user preferences, multi- (ii) m(o A 1f) m( m(y)ple strategies and explanation construction are all user- (iii) m( V,if) m(O)Um(tj)centered. They together form our understanding of inter- (iv) m(¢ - 1ft) -m(0) U m(ff)active data mining. (v) m( <-*tt) (m(0)nm(y,))u

(-m(0) n -m(tjt))2. Multiple Views in Data Mining The formula 0 can be viewed as the description of a set of

objects in m(0).Many techniques and models of machine learning and In the study of formal concepts, every concept consists

statistics can be applied to data mining. Typically, each of two parts, the intention and extension [14, 39]. A for-model presents a particular and single view of data, or dis- mula 0 represents the intention of a concept and a set ofcovers a specific type of knowledge embedded in data. It objects m(0) denotes the extension of the concept. Theexplores different types of knowledge, different features of pair (p m(0)) is denoted as a concept, which can be de-data, different user requirements, and different interpreta- scribed as the set of objects m(0) that having the featuretions of data [5]. Multiple views imply that one is able expressed by the formula ¢.to derive multiple hierarchies for the same system [44].Each hierarchy is defined based on a particular interpre- 2.2. Classes of rulestation, and can be broken down into multiple levels. Withrespect to a particular hierarchy, levels represent localized Knowledge generated from a large data set is often ex-views, and they are tied together to form a global view. One pressed in terms of a set of discovered rules. One of themay climb up and down the hierarchy to study a system at important tasks of data mining is to find strong relation-various resolution levels. ships between concepts [41]. A rule can be defined and

458

represented as 0 =#> ,u, where 0 and jf are intentions of two Different types of rules can be identified based on differ-concepts. The symbol => represents a connection or a rela- ent objective measures. For example, peculiarity rules havetionship between two concepts 0 and ,u. The meanings and low support and high confidence, exception rules have lowinterpretations of => are varied based on user requirements support and high confidence, but complement to other highor applications. Rules can be classified according to the support and high confidence rules, and outlier patterns areinterpretations of =>. In other words, different interpreta- the ones that far away from the statistical mean [49, 51, 52].tions of => in a rule represent different types of knowledge. Although statistical and structural information providesIt might be impossible to list all classes of rules. We only an effective indicator of the potential effectiveness of a rule,discuss several of them. its usefulness is limited. One needs to consider the sub-An association rule 0 =#> ,u describes an association re- jective aspects of rules. Subjective measures consider the

lationship between two concepts 0 and ,u. That is, when p user who examines the rules. For example, Silberschatzoccurs, jf occurs too. and Tuzhilin proposed a subjective measure of rule inter-A classification rule 0 #> ,u presents a decision relation- estingness based on the notion of unexpectedness and in

ship between the two concepts. That is, if 0 occurs then ,u terms of a user belief system [30, 31]. The basic idea ofoccurs. their measure is that the discovered rules which have more

Sometimes, users prefer to use a subset of attributes unexpected information with respect to a user belief systemA C At to define a rule. The benefits of such attribute se- are deemed as more interesting. Thus, subjective measureslected rules are that they are normally shorter, cheaper, and are both application and user dependent. In other words, ahence more understandable, actionable and profitable. The user needs to incorporate other domain specific knowledgeattribute set A contains attributes that singly necessary and such as user interest, utility, value, profit, action-ability,jointly sufficient to keep the information provided by the etc. [27, 34].original attribute set. In the term of rough set theory, A is As one example, profit or utility-based mining is a spe-called a reduct [24, 25]. cial kind of constraint-based mining, taking into account

In other situations, the attribute set A defines a unique of both statistical significance and profit significance [35].object set U' C U, such that all the objects possess A are Doyle discussed the importance and usefulness of the no-in U', and A is the attribute set including all the features tions of economic rationality and suggested that economicshared by U'. Such a bounding relationship between A and rationality can play a large role for measuring a rule [10].U' can be explained by formal concepts in the domain of Similarly, Barber and Hamilton proposed the notion offormal concept analysis [39, 40]. share measures which consider the contribution, in terms

of profit, of an item in an item set [2].2.3. Objective and subjective measures

2.4. Association measures of conceptsBased on the extensions, various quantitative measures

can be used for rule interestingness evaluation [45, 49]. Many quantitative measures are proposed to evaluateMeasures can be classified into two categories: objec- different relationships between attribute sets. Each mea-

tive measures and subjective measures [30]. Objective sure reflects a different and specific feature of data. Wemeasures depend on the structure of rules and the underly- consider several types of measures below.ing data used in the discovery process. Subjective measures One-way associationdepend on the user beliefs [22, 30].

Measures defined by statistical and structural informa- Confidence is a commonly used measure for evaluatingtion are viewed as objective measures. For example, Gago association of a rule. The basic idea of confidence is de-and Bento proposed a measure for selecting discovered scribed as the probability that concept ,u occurs given thatrules with the highest average distance between them [13]. concept 0 occurs [1]:The distance measure is developed based on the structuraland statistical information of a rule such as the number of p( ) m(Q) nm(y)attributes in a rule and the values of attributes. A rule isdeemed as interesting if it has the highest average distance P(O tjt)to the others. One does not consider the application anddomain when measuring the discovered rules by using thedistance measure. Information theoretic measures are also where P(p) is called the support of 6 and is defined byobjective measures because they use the underlying data in P(Q) =m(P).a data set to evaluate the information content or entropy of Confidence is one-direction from 0 to tju and can bea rule [20, 32, 42]. viewed as a one-way association measure [49]. In other

459

words, the concept 0 depends on the concept ,u, but ,u may 2.5. Correlation measures of attributesnot depend on '.

Two concepts 0 and jf are viewed as being non- The association relation we discussed above shows theassociative or independent if the occurrence of 0 does not relationship between two concept intentions, each is de-alter the probability of ,u occurring. In other words, if the fined by a specific combination of one attribute and one ofoccurrence of 0 can affect the probability of ,u, then we its possible values. This is also called a local connection,say that the concept jf is dependent on or associated with generates a low order rule. A global connection is charac-the concept '. Typically, a rule with a high support and a terized by showing the relationships between all combina-high confidence are considered having a strong association tions of values on one set of attributes and all combinationsrelationship between two concepts. of values on another set of attributes. It is also called a

high order rule, revealing the correlation of two attributeTwo-way association sets [43].

A statistical measure, called correlation coefficient, canRI is also a measure on the evaluation of the association be used to compute the degree of correlation between two

of a discovered rule [26]. It is defined by: numerical attributes X Y C At [33]:

RI P(O V) -P(')P(Vt) r(X Y)= SSXYThe two concepts ' and ,u are recognized as being non- SSSyassociative or independent when RI = 0 (P(O)P(yt) = where SSx (x )2, x C V, is the mean value of x.P(' tu)). In fact, this measure determines the degree of y-)2 yC V is the mean value of . SSxyassociation of a rule by the comparison of the Joint proba- (y 5 t cY(-X-X) (y-Y 1S the covariance between the attributes Xbility of two concepts P(O tu) with respect to the expected and Y. S and SSy are the standard deviations of theprobability of the non-association assumption P(P)P(tf). al vs withyrese themeanval. extendingRI > 0 represents a positive association from 0 to ,u. RI < 0 this equation, we have:represents a negative association, which is from ,u to '.

IND is similar to the measure of rule-interest [4]. This r(X Y) = E(X A Y) - E(X)E(Y)measure is defined by: ax ay

IND - P ( y') where E(X A Y) = n xy, E(X) = Yx, and E(Y) = Yy are

P(')P(ty) the expected values on the attributes X A Y, X, and Y re-

The two concepts 'P and tjt are recognized as being non- spectively. ax and ay denote the standard deviations.

associative or independent when IND =1 (P('P)P(t\ ) =For a discovered high order ruleX #= Y, we suppose that

P('P tt)). This measure is the ratio of the joint probabil- there exist two value sets Vx and Vy in an information table.. . . ~~~~~~forthe two attributes. If Vy increases or decreases as Vxity of 'P0n j and the probability obtained if 'P and f are foincreases then the two attributes are considered as bein

assumed to be independent. In other words, the rule has a gstronger association if the joint probability is further away closely correlated.stronger . . The correlation coefficient is a number between 0 andf The probeabilitysundthersimindepeden re. with respect 1. The closer the correlation coefficient to 1, the strongerThe IS measure iS another similar measure with respect

the correlation between attributes. If the two attributes areto the measure of rule-interest [33]. It can be defined by: ntcreae,rXY szr eas fSx nnot correlated, r(X Y) iS zero because of SSxy = 0 and

IS P(- i) E(X Y) = E(X)E(Y). In other words, if X and Y are not/P(')P( tf) closely related to each other, they do not "co-vary", the co-

variance is small and the correlation is small. IfX and Y areclosely related, the covariance is almost the same as ax ay

The basic notion of the IS measure is similar to the measure and the correlation is almost 1.of independence. Furthermore, it is equivalent to the geo- Correlated attributes may not necessarily be dependentmetric mean of confidences of the rule. However, its range or associated. Also, attributes that are associated may notis between 0 and 1 instead of IND's range, between 0 and necessarily be correlated. Mari and Kotz analyzed severalC. common and different features of association and correla-

The RI, IND, and IS measures are symmetric and viewed tion measures from the statistical point of view [23]. Cor-as two-way association measures [49]. If two concepts ' relation coefficient only evaluates the linear relationshipand i/u in a rule have a two-way association relationship, between attributes, but there are situations in which linearthen the concept 'P must depend on or be associated with correlation do not exist but a strong nonlinear associationthe concept i/u, and the converse is also true. exists between attributes.

460

Brin et al. proposed the use of Chi-square (Z2) prob- standing, or the actionability. It is virtually impossible toability testing to evaluate the association between two at- list all practical interpretations of user judgement. In addi-tributes in a discovered rule [4]. If the Chi-squared value is tion, the meaning of user judgement becomes clear only in0, then the attributes are independent, otherwise, they are a particular context of application.dependent on each other. However, the x2 testing does notgive the strength of the association between attributes in a Quantitative user preferencesdiscovered rule [33]. Instead, it can only decide whether at-tributes in a rule are non-associative or associative. There- A simple and straightforward way to represent userfore, it cannot be used to rank the discovered rules. Liu judgement on items is to assign them with numericalet al. suggested to prune the insignificant rules by using weights [19]. Formally, it can be described by a mapping:the standard x2 test combined with a support-confidence w: X - (1)test [21].

where X is a finite non-empty set of items, and 9i is the set2.6. Single rule measures and multiple rule mea- of real numbers. For an item a C X, w(a) is the weight of

sures a. The numerical weight w(a) may be interpreted as thedegree of importance of a, the number of occurrences of

The measures of rule interestingness can also be clas- a in a set, or the cost of testing a in a rule. This inducessified into measures for a single rule and measures for a naturally an ordering of items.set of rules. Furthermore, a measure for a set of rules canbe obtained from measures for single rules. For example,conditional probability can be used as a measure for a sin- A difficulty with the quantitative method is the acquisi-gle classification rule, and conditional entropy, which is de- tion of the precise and accurate weights of all items. Onfined by conditional probability, can be used as a measure the other hand, a qualitative method only relies on pair-for a set of classification rules [42]. wise comparisons of items. For any two items a b C X,

Measures for multiple rules concentrate on features of a we assume that a user is able to state whether one is moreset of rules. They are normally expressed as some kinds of important than, or more preferred to, the other. This qual-average. Many measures, known as summaries, have been itative user judgement can be formally defined by a binaryexamined for multiple rules [17].examinedformultiplerules [17].preference relation S- on X. For any two a b C X:

One may consider integrated and general systems to sat-isfy the various user requirements. Interactive approaches a >- b # the user prefers a to b (2)can be viewed as one of the potential solutions.

In the absence of preference, i.e., if both -,(a >- b) and

3. User Preferences (b >- a) hold, we say that a and b are indifferent. Anindifference relation - on X is defined as:

A user preference makes relation between a user and a a - b # -i(a >- b) A -i(b >- a) (3)target item which contains several features. An item can be Based on the strict preference and indifference, one cana target concept to be learned, a system, a model, an algo- define a preference-indifference relation >- on X:rithm, or an approach that is ready to be chosen. A featureis normally associated with feature description, measures a >- b # a >- b V a - b (4)of this feature, and a set of possible feature values, or fea-ture~~vau ranes Usal prfrneo.nie smi If a >- b holds we say that b is not preferred to a, or a is atture value ranges. Usually preference of an item iS indi-- '

rectly related to the preference of its features [1 8]. In this least as good as b. The strict preference can be re-expressedsection, we discuss different types of user preferences. as a >- b # a Si b A (b Fi a).

A user preference relation satisfies two axioms: asym-3.1. User preference modelling metry and negative transitivity, so it is a weak order on X.

For any a b c E X:

User judgement can be expressed in various forms. a >- b (b > a);Quantitative judgement involves the assignment of differ- b) A-(b S C)) -(a c)ent weights to different items. Qualitative judgement is ex-pressed as an ordering of items. In many situations, user The asymmetry axiom states that a user cannot prefer a tojudgement is determined by semantic considerations. For b, and at the same time prefer b to a. The negative transi-example, it may be interpreted in terms of more intuitive tivity axiom states that if a user does not prefer a to b, nornotions, such as the cost of testing, the easiness of under- b to c, then the user should not prefer a to c.

461

A weak order imposes a special structure on the set preference. That is, the numbers of u(a) u(b) as or-X of items. The indifference relation - divides the set dered by > reflect the order of a b under the preferenceof items into disjoint subsets. Furthermore, for any two relation >-.distinct equivalence classes [a] and [b] of X/ -, either The utility function also trustfully represents the indif-[a] -' [b] or [b] -' [a]- holds. In other words, it is pos- ference relation, that is,sible to arrange the items into several levels so that itemsin a higher level are preferred to items in a lower level, and a b4 u(a) u(b) a bC X (8)items in the same level are indifferent. According to Theorem 1, for a given preference relation,

When each equivalence class contains exactly one item, there exist many utility functions. The utility functions arethe preference relation >- on X is in fact a linear order itself. in fact based on the ordinal scale. That is, it is only mean-In general, if we do not care how to order items in an equiv- ingful to examine the order induced by a utility function.alence class, we can extend a weak order into a linear order Although numerical values are used, it is not necessarilysuch that a is ranked ahead of b if and only if a S- b. For a meaningful to apply arithmetic operations on them.weak order, its linear extension may not be unique [12].

4. Multiple Strategies for Abnormal SituationConnections of quantitative and qualitative preferences Handling

The quantitative judgement can be easily translated intoqualitative judgement. Given the weights of items, we can In solving real world problems, we often face theuniquely determine a preference relation. Suppose there choices between simple and complicated descriptions, pre-ariqelt itermsnaan b,efa)end rela Spresen therim- cise and imprecise characterizations, understandability and

portance of a and b, respectively, a preference relation is incomprehensibility of methods, and exact and approxi-defianed by:a and b, respectively, a preference relation is

mate solutions. In general, there is a tradeoff of such two

a > b w(a) > w(b) (5) opposite criteria of the competing nature. Human problemsolving depends crucially on a proper balance and compro-

When w(a) and w(b) is the cost of items a and b, the fol- mise of these incompatible criteria. Different users can de-lowing preference relation should be used instead, velop different knowledge representation frameworks and

related automated learning and mining mechanisms to de-a >- b -## w(a) < w(b) (6) scribe and identify abnormal situations or behaviors. Con-

sequently, this issue must be addressed in the user-centeredIn general, two items may have the same weights. The interactive data mining.induced preference relation is indeed a weak order, i.e.,asymmetric and negatively transitive. 4.1. Retaining strategies

The translation to a preference relation only preservesthe ordering of items implied by the weights. The addi- A retaining strategy, by its name, means to keep thetional information given by the absolute weight values is quality of the rules, especially their accuracy, as high aslost. In the reverse process, a user preference relation can they could be. The most commonly used accuracy mea-be represented in terms of the weights of items. A rational sure is the confidence measure defined in the last section.user's judgement must allow numerical measurement. Clearly, the higher the confidence value, the more accurate

The following theorem states that a weak order is both the rule is. In most real situations, a rule, in the form ofnecessary and sufficient for a numerical measurement [12, 0 =#> ,u, is not always deterministic for the given universe,28]: but rather approximate and uncertain. In other words, the

confidence value of the rule is less than or equal to 100%.Theorem 1 Suppose >- is a preference relation on afinite Yoe l doae h s faseii nwegnon-empty set X of items. There exists a real-valued func- etato advdata mng framewor baed onles

tion u:X=>9~atisfyng th condtion:representation and data mining framework based on rulesteodiio.and exceptions [47]. In this framework, normal and ab-

a - b #~u(a)> u(b) a b CX (7) normal situations or behaviors occur as pairs of dual enti-ties: rule succinctly summarizes normal situations, and ex-

ifand only if >- is a weak order Moreover, u is uniquely de- ceptions characterize abnormal situations. These two en-fined up to a strictly monotonic increasing transformation. try types each provides the context for defining the other.

Rule+exception strategies strike a practical balance be-tween simplicity and accuracy.

The function u is referred to as an order-preserving utility Two types of exceptions can be identified, incorrect in-function. It provides a quantitative representation of a user terpretations produced by the existing rules, and the inter-

462

pretations that cannot be produced by the existing rules [6]. Decrease the costFor simplicity, they are refereed to as incorrectly coveredexceptions and uncovered exceptions, respectively. A general rule may include more incorrectly covered ex-

For the incorrectly covered exceptions, two potential so- ceptions. Suppose a set of objects in the universe can be

lutions exist. A commonly used method adds an additional defined by a descriptive formula p. According to a learntcondition* ' to form a more specific condition ,A O'. The rule, 0 =4> ,l. That means that generally all the objects sat-

new rule 0 A 1' => ,u should produce fewer or no excep- isfying 0 should imply ,jl. However, the rule could be tootions. Another alternative, a rule+exception strategy treats general, thus, an object x e m(') implies a decision value0 =4 yj as a general rule with probability and searches for different from yJl, say, it satisfies a decision value yI2. Thisexception rules or exception instances to the general rule. becomes an exception, or an error, of classification.

For a specific classification exception, denoted asFor uncovered exceptions, we could attempt to add an ('iV2thexpiocunisheumrofbjtsnalternative condition to form a more general rule 0 V O' ==> Tyf Tf)th/ xeto on stenme fojcsialTheertive condition tovformamore gesancer rule PFor clarity the universe that possess this exception, which can be de-,ut. The extra'p' could cover more instance of ,ut. For clarity, fined aswe can think of them as two rules 0 =4> ,u and'' => ,u. Onecan view the second rule as an exception rule to handle errCount(yji 1f2) {x des([x]) 4 yji des(x) 4 t/2}the uncovered exceptions. In general, we can sequentiallyconstruct a set of rules to cover instances of yj, with the where [x] indicates the entire equivalence class contains x,new rule as an exception rule to the previous rules. des( ) means the description of the given object, or the ob-

ject set, by a formula, and tul #t y/2.4.2. Compromising strategies Yao and Wong applied the Bayesian decision procedure

for classification [46]. The basic idea is that different errorsA compromising strategy promotes the construction of may indicate different cost. A rule set satisfying the cost

more general rules containing more incorrectly covered ex- preservation strategy will not increase the error cost.ceptions. A compromising strategy means to compromisethe accuracy to a certain level, in order to keep another im- 5. Exploring Explanations of Discovered Pat-portant feature at a relatively high measuring level. That ternmeans that the high accuracy is often not the goal in orderto preserve or improve another property. Intuitively, a com- The role of explanation is to clarify, teach, and con-promising strategy needs to introduce a probability value, vince [9]. There are many kinds of explanations. An expla-say /3, to be as an accuracy threshold. nation could be a definition of a term, the cause and effect

of an event, or the significance of a phenomenon. Differ-Improve the generality ent explanations are the answers to many different kinds of

questions. Explanation is both subjective and objective [7].In most cases, a compromising strategy generates It is subjective because the meaning of explanation, or the

shorter and simpler rules defined by a proper subset of en- evaluation of a good explanation, is different for differenttire feature set. By choosing A C At, a set of formulas QPA people at different times. On the other hand, explanationcan be defined. Borrowing the concept of the rough set is objective because it must win general approval as a validtheory, we can define a /3-positive region with respect to explanation, or has to be withdrawn in the face of new ev-a target concept tu. The /3-positive region is the union of idence and criticism. The interpretations and explanationsall objects satisfying the rules defined by A with the confi- enhance our understanding of the phenomenon and guidedence greater than or equal to /3, which is denoted as: us to make rational decisions.

Yao et al. suggested to add an explicit explanation mod-POSf (Yt) = {m('): P(tjt') > /3 ' C 'p } ule into the existing data mining processes [48]. Explana-

tions of data mining address several important questions,where P(tjtp) m(V)nm(o) To preserve the generality, a such as, what needs to be explained? How to explain the

w=M(¢0)l . To resrveth geeralty a discovered knowledge? Moreover, is an explanation cor-heuristic criterion can be defined as: given a predefined ,B rect and complete?value, POSfA ( yj) > POSAt ( yj). A rule set satisfies the gen-erality preservation strategy, with individual rule in form of 5.1. Discovered patterns to be explained'p 4> tu, can classify more objects in the universe than theset of rules produced by the entire set of At, while keeping The knowledge discovered from data should be ex-the confidence not less than /3. plained and interpreted. Knowledge can be discovered

463

by unsupervised learning methods. Unsupervised learning (within the background knowledge) the discovered patternstudies how systems can learn to represent, summarize, and is most likely to happen, or how the background knowledgeorganize the data in a way that reflects the internal structure is related to the pattern.(namely, a pattern) of the overall collection. This processdoes not explain the patterns, but describes them. The pri- 5.4. Explanation evaluationmary unsupervised techniques include clustering mining,belief networks learning, and association mining. The cri- The role of explanation in data mining is positionedteria for choosing which pattern to be explained are directly among proper description, relation and causality. Com-related to pattern evaluation step of data mining. prehensibility is the key factor in explanations. The accu-

racy of the constructed explanations relies on the amount5.2. Profiles used to construct explanations of training examples. Explanations perform poorly with

insufficient data or poor presuppositions. Different back-Background knowledge provides features that can pos- ground knowledge may infer different explanations. There

sibly explain a discovered pattern. An explanation may in- is no reason to believe that only one unique explanation ex-clude many branches of inquiry: physics, chemistry, mete- ists. One can use statistical measures and domain knowl-orology, human culture, logic, psychology, and the method- edge to evaluate different explanations.ology of science. In data mining, explanation can be madeat a shallow, syntactic level based on statistical information, 6. Conclusionor at a deep, semantic level based on domain knowledge.

The required information and knowledge for explana- In this paper, we focus on interactive data mining whichtion may not necessarily be inside the original dataset. One is characterized by user requirement and user judgement.needs to collect additional information for explanation con- On the abstract level, we discuss multiple views and userstruction. preference in data mining domain. On the application

The key question is the selection of the features that are level, we discuss the real problems and concerns while cop-generally explanatory to the target concept from many fea- ing with abnormal environments and most-in-need expla-tures that happen to be related to the current discovered nations.pattern. Craik [7] argued that the power of explanations We argue that more effective data mining systemsinvolves the power of insight and anticipation. One col- should support better human-machine interactivity. Thelects certain features based on the underlying hypothesis concern of effectiveness and the concern of efficiencythat they may provide explanations of the discovered pat- should be synchronized with user cognitive phases and re-tern. That something is unexplainable may simply be an quirements. Bearing user requirement in mind, the re-expression of the inability to discover an explanation of a searches on interactive data mining are fairly broad.desired sort. The process of selecting the relevant and ex-planatory features may be subjective, and trial-and-error. In Referencesgeneral, the better our background knowledge is, the moreaccurate the inferred explanations are likely to be.

[1] Agrawal, R., Imielinski, T. and Swami, A., Mining associa-tion rules between sets of items in large databases, Proceed-5.3. Explanation construction ings ofACM SIGMOD, 207-216,1993.

[2] Barber, B. and Hamilton, H., Extracting share frequent item-Explanations for data mining results reason inductively, sets with infrequent subsets, Data Mining and Knowledge

namely, drawing an inference from a set of acquired train- Discovery, 7, 153-185, 2003.ing instances, and justifying or predicting the instances one [3] Brachmann, R. and Anand, T., The process of knowledgemight observe in the future. discovery in databases: a huamn-centered approach, Ad-

Supervised learning methods can be applied for the ex- vances in Knowledge Discovery and Data Mining, AAAIplanation construction. The goal of supervised learning is Press & MIT Press, Menlo Park, CA, 37-57, 1996.to find a model that will correctly associate the input pat- [4] Brin, S., Motwani, R. and Silverstein, C., Beyond marketterns with the classes. In real world applications, super- baskets: generalizing association rules to correlations, Pro-vised learning models are extremely useful analytic tech- ceedings ofACM SIGMOD, 265-276,1997.

niques. Teieus surid[5] Chen, Y.H. and Yao, Y.Y Multiview intelligent data analysisniques.The widely used supervised learning methods in- based on granular computing, Proceedings of IEEE Inter-

dlude decision tree learning, rule-based learning, and deci- national Conference on Granular Computing (GrcO6), 281-sion graph learning. The learned results are represented as 286, 2006.either a tree, or a set of if-then rules. The constructed ex- [6] Compton, P. and Jansen, B., Knowledge in context: a strat-planations give some evidence about under what conditions egy for expert system maintenance, Proceedings of the 2nd

464

Australian Joint Conference of Artificial Intelligence, 292- crease profit of a company, Proceedings ofPKDD, 587-592,306, 1988. 2000.

[7] Craik, K., The Nature ofExplanation, Cambridge University [28] Robers, F., Measurement Theory, Addison Wesley, Mas-Press, London, New York, 1943. sachusetts, 1979.

[8] Demri, S. and Orlowska, E., Logical analysis of indis- [29] Shneiderman, B., Designing the User Interface: Strategiescernibility, in: E. Orlowska, (Ed.), Incomplete Information: for Effective Human-Computer Interaction, third edition,Rough Set Analysis, Physica-Verlag, Heidelberg, 347-380, Addison-Wesley, 1998.1998. [30] Silberschatz, A. and Tuzhilin, A., On subjective measures

[9] Dhaliwal, J.S. and Benbasat, I., The use and effects of of interestingness in knowledge discovery, Proceedings ofknowledge-based system explanations: theoretical founda- KDD, 275-281, 1995.tions and a framework for empirical evaluation, Information [31] Silberschatz, A. and Tuzhilin, A., What makes patterns in-Systems Research, 7, 342-362, 1996. teresting in knowledge discovery systems, IEEE Transac-

[10] Doyle, J., Rationality and its role in reasoning, Computa- tions on Knowledge and Data Engineering, 8, 970-974,tional Intelligence, 8, 376-409, 1992. 1996.

[11] Elm, W.C., Cook, M.J., Greitzer, FL., Hoffman, R.R., [32] Smyth, P. and Goodman, R., An information theoretic ap-Moon, B. and Hutchins, S.G., Designing support for intelli- proach to rule induction from databases, IEEE Transactionsgence analysis, Proceedings of the Human Factors and Er- on Knowledge and Data Engineering, 4, 301-316, 1992.gonomics Society, 20-24, 2004. [33] Tan, P.N., Kumar, V. and Srivastava, J., Selecting the right

[12] Fishburn, P.C., Utility Theory for Decision-Making, John interestingness measure for association patterns, Proceed-Wiley & Sons, New York, 1970. ings ofKDD, 2002.

[13] Gago, P. and Bento, C., A metric for selection of the most [34] Wang, K. and He, Y, User-defined association mining, Pro-promising rules, Proceedings ofPKDD, 19-27, 1998. ceedings ofPAKDD, 387-399, 2001.

[14] Ganter, B. and Wille, R., Formal Concept Analysis: Mathe- [35] Wang, K., Zhou, S. and Han, J., Profit mining: from patternsmatical Foundations, Springer-Verlag, New York, 1999. to actions, Proceedings ofEDBT, 70-87, 2002.

[15] Han, J., Hu, X. and Cercone, N., A visualization model of [36] Wang, YX., On cognitive informatics, Proceedings ofinteractive knowledge discovery systems and its implemen- ICCI'02, 34-42, 2002.tations, Information Visualization, 2, 105-125, 2003. [37] Wang, Y.X. and Liu, D., On information and knowledge

[16] Hancock, P.A. and Scallen, S.F, The future of function al- representation in the brain, Proceedings ofICCI'03, 26-29,location, Ergonomics in Design, 4, 24-29, 1996. 2003.

[17] Hilderman, R.J. and Hamilton, H.J., Knowledge Discov- [38] Wang, YX., On autonomous computing and cognitive pro-ery and Measures ofInterest, Kluwer Academic Publishers, cesses, Proceedings ofICCI'04, 3-4, 2004.Boston, 2001. [39] Wille, R., Restructuring lattice theory: an approach based

[18] Jung, S.Y, Hong, J.H. and Kim, T.S., A formal model for on hierarchies of concepts, in: I. Rival (Ed.), Ordered sets,user preference, Proceedings ofICDM'02, 235-242, 2002. Reidel, Dordecht-Boston, 445-470, 1982.

[19] Krantz, D.H., Luce, R.D., Suppes, P. and Tversky, A., Foun- [40] Wille, R., Concept lattices and conceptual knowledge sys-dations ofMeasurement, Academic Press, New York, 1971. tems, Computers Mathematics with Applications, 23, 493-

[20] Lee, T.T., An information-theoretic analysis of relational 515, 1992.databases - part I: data dependencies and information met- [41] Yao, Y.Y., On modeling data mining with granular comput-ric, IEEE Transactions on Software Engineering, 13, 1049- ing. Proceedings ofCOMPSAC, 638-643, 2001.1061, 1987. [42] Yao, Y.Y., Information-theoretic measures for knowledge

[21] Liu, B., Hsu, W. and Ma, Y., Pruning and summarizing discovery and data mining, in: Karmeshu (Ed.), Entropythe discovered associations, Proceedings ofKDD, 125-134, Measures, Maximum Entropy and Emerging Applications,1999. Springer, Berlin, 115-136, 2003.

[22] Liu, B., Hsu, W. and Chen, S., Using general impressions [43] Yao, YY, Mining high order decision rules, in: M.to analyze discovered classification rules, Proceedings of Inuiguchi, S. Hirano and S. Tsumoto (Eds.), Rough Set The-KDD, 31-36, 1997. ory and Granular Computing, Springer, Berlin, 125-135,

[23] Mari, D.D. and Kotz, S., Correlation and Dependence, Im- 2003.perial College Press, London, 2001. [44] Yao, Y.Y., Perspectives of granular computing, Proceedings

[24] Pawlak, Z., Rough sets, International Journal of Computer of 2005 IEEE International Conference on Granular Com-Information and Science, 1982, 11(5), 341-356. puting, 1, 85-90, 2005.

[25] Pawlak, Z., Rough Sets: Theoretical Aspects of Reasoning [45] Yao, Y.Y., Chen, Y.H. and Yang, X.D., A measurement-About Data, Kluwer Academic Publishers, Dordrecht, 1991. theoretic foundation for rule interestingness evaluation, Pro-

[26] Piatetsky-Shapiro, G., Discovery, analysis, and presentation ceedings of Workshop on Foundations and New Directionsof strong rules, in: G. Piatetsky-Shapiro and W.J. Fraw- in Data Mining in the 3rd IEEE International Conferenceley (Eds.), Knowledge Discovery in Databases, AAAI/MIT on Data Mining (ICDM 2003), 221-227, 2003.Press, 229-238, 1991. [46] Yao, Y.Y. and Wong, S.K.M., A decision theoretic frame-

[27] Ras, Z. and Wieczorkowska, A., Action rules: how to in- work for approximating concepts, International Journal of

465

Man-machine Studies, 37, 793-809, 1992.[47] Yao, YY, Wang, F.Y, Wang, J. and Zeng, D., Rule + excep-

tion strategies for security information analysis, IEEE Intel-ligent Systems, 20, 52-57, 2005.

[48] Yao, YY, Zhao, Y. and Maguire, R.B., Explanation-oriented association mining using rough set theory. Proceed-ings of Rough Sets, Fuzzy Sets and Granular Computing,165-172, 2003.

[49] Yao, YY and Zhong, N., An analysis of quantitative mea-sures associated with rules, Proceedings of PAKDD, 479-488, 1999.

[50] Zhao, Y. and Yao, Y.Y., Interactive user-driven classificationusing a granule network, Proceedings ofICCI'05, 250-259,2005.

[51] Zhong, N., Yao, Y.Y. and Ohshima, M., Peculiarity orientedmulti-database mining, IEEE Transactions on Knowlegdeand Data Engineering, 15, 952-960, 2003.

[52] Zhong, N., Yao, Y.Y., Ohshima, M. and Ohsuga, S., In-terestingness, peculiarity, and multi-database mining, Pro-ceedings ofIEEE International Conference on Data Mining(ICDM'O1), 566-573, 2001.

466

[ieee 2006 5th ieee international conference on cognitive informatics - beijing, china...

Documents