Transcript
Page 1: [IEEE IEEE International Conference on e-Business Engineering (ICEBE'05) - Beijing, China (2005.10.12-2005.10.18)] IEEE International Conference on e-Business Engineering (ICEBE'05)

Design and Implementation of Commerce Data Mining System Based on

Rough Set Theory*

*

Support by the National Natural Science Foundation of China. Key Project No. 70371054

Xiang Yang,

Tongji University,

P.R.China, 200092

Wu Weiying,

Huzhou Teachers College,

P.R.China, 313000

Mao Hairong, Song Qingwei

Tongji University,

P.R.China, 200092

Abstract On the basis of analyzing characters of commercial

data and Rough Set, this paper proposes a data mining

system using Rough Set, and presents detailed design

and implementation. The novel commercial data

mining system presented in this paper include three

parts: data preparation, data preprocessing, mining and

evaluation. During the design and implementation of

three levels, this paper presents several novel

algorithms using rough set. These algorithms can

accelerate data mining process and improve quality of

data mining. Finally, the paper tests the result of

research and algorithm through an instance.

1.Introduction

Commercial data mining possesses the functions of

automatic trend prediction and detecting the unfound

modes. It is easy to conduct, and its analysis results are

reliable and transparent. Businessmen are attaching

more and more attention to the commercial data mining

system and its functions [1-3]

.

However, the present data mining system still has

many vulnerabilities [4-6]

, primarily as follows: 1.No

better methods are so far devised to promptly delete or

correct noise data and to deal with empty data; 2. The

algorithmic process of data mining is time-consuming

due to the mere concerns to the complications of the

algorithm itself and the lack of concerns to the

environmental resources for the working hardware; 3.

Research on the accuracy of the freely selected rules is

incomplete. 4. Neither knowledge nor regulations are

well protected and updated promptly.

The advantages of the rough set are as follows:

It can remove the redundant and useless information by

discovering the relations among the data, and simplify

the expressional space of the input information; The

results gained through the rough set are presented in

the form of rules and are thus easy to understand;

The rough set can reduce the information granule so as

to improve the statistics meaning of the rules. it can

handle the coordinated and uncoordinated data[7]

.

Commercial data include customer data, product

directory, pricing data, order form state, etc. Therefore,

targeting at the characteristics of commercial data, this

article form and complete a commercial data mining

system based on the rough set theory combined with

data mining techniques.

2. Systemized framework of commercial

data mining

Figure 1 is illustrating the systemized framework

devised through the combination of the rough set

theory and data mining technique.

In Figure 1, the commercial mining system based

on the rough set is composed of three layers. The 1st

layer is data preparation, which defines the resources of

the commercial data, chooses suitable data resources,

and identifies data, organizes data and builds the

decision-making tables. The 2nd

layer is data

pre-processing, which is to disperse all the continuous

attributes in the decision-making tables and then

infiltrates data and analyzes the completeness of the

data through the rough set. The 3rd

layer is mining

evaluation. It is to simplify the attribute and attribute

value, and then acquire new rules and more updated

knowledge through incremental learning, finally

evaluate and give feedback on the obtained results.

3.Designing and Realizing the System

3.1. Designing and realizing the preparation

level

Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE

Page 2: [IEEE IEEE International Conference on e-Business Engineering (ICEBE'05) - Beijing, China (2005.10.12-2005.10.18)] IEEE International Conference on e-Business Engineering (ICEBE'05)

3.1.1. Building up the commercial data warehouse

Nowadays with commercial competitions getting

more and more severe, a large amount of data are

accumulated and distributed in various departments,

platforms and database of enterprises. Commercial data

are affected by many uncertain factors and incomplete

information. This type of multiple distributions of data

sources makes it necessary to build up a commercial

data warehouse to improve mining efficiency and

accuracy. The following are the definitions of some

data required by the construction of commercial data

warehouse.

1) Data about purchase of commodities: including

providers’ names, providers’ codes, codes of the

purchase departments, tax rate, payment conditions etc;

2) Data about storing of commodities: including codes

of warehouses, codes of the stored commodities, the

quantity of the stored commodities, unit price etc;

3) Data about sales of commodities: including order

date, order number, conditions of payment, sales

departments, sales clerks, names of commodities,

quantity, unit, unit price, tax rate, delivery date, number

of the delivery bill, the commodities warehouses,

delivery address, quantity of returned goods etc;

4) Data about accountings of commodities: including

fixed assets, profits, debts, circulating funds etc;

5) Data about customers: including names of customers,

codes of customers, number of bank account etc.

3.1.2. Data deletion

Deleting data with noise is a rather complicated

process that affects a lot of other aspects. The whole

deleting process can be divided into the following

steps:

1) Check on the spelling errors;

2) Delete the repeated records, make up for the

incomplete records and set the uncoordinated

records;

3) Use the testing function to check on the data;

4) Repeat the above steps a number of times

according to the test results.

3.2. Designing and implementation of data

preprocessing

Data preprocessing in this article is mainly divided

into three phases: data dispersion, data filtering and

data completion.

3.2.1. Data splitting phase

Due to the fact that the rough set theory merely

deals with the splitting nature and is unable to handle

Draw correct, credible, uniform data

user User interface

Inner data warehouse

Data clean

Data prepareand define

Data filter

Data comsummate

Continual attribute disperse

Attribute value and simplify

Rules draw and simplify

Commerce warehouse

Supply decision-support for user

Outer data warehouse

Other data

Table build Target data base

Data

Prepare

layer

Result evaluate

Data

Predispose

Layer

Mining

Evaluation

Layer

Knowledge filter

Algorithm warehouse

Unsatisfactory Satisfactory

Figure 1. Framework of commercial data mining based on the rough set

Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE

Page 3: [IEEE IEEE International Conference on e-Business Engineering (ICEBE'05) - Beijing, China (2005.10.12-2005.10.18)] IEEE International Conference on e-Business Engineering (ICEBE'05)

the continuity nature, it is necessary to go on with the

research on the splitting of the continuity nature.

Nowadays, there are many methods dealing with

continuity attributes [9-11]

. Among them, some

algorithms proposed in literature [9]

.are relatively

effective and representative. This algorithm is

commonly applicable based upon the feedback

information of compatibility of the decision-making

table, and it doesn’t need field knowledge while fully

remaining the information of the original

decision-making table. However, this algorithm does

not seem to work well on the occasion when the

attribute values are widely spread and when the cutting

points of the attribute values are relatively focused on

one extreme of the section.

Targeted at the characteristics of commercial data,

this article has improved the dispersing methods of

continuity values brought forth by some researchers so

that the algorithm can process the data whose

continuity attribute is widely spread, using the

compatibility of the decision-making table and using

the contractive gene to improve the preciseness of the

result. The algorithm goes as follows:

Suppose T= U C D V f is a decision-making

table },,{1 n

xxU = },,{1 m

ccC = .

If the incompatibility of attribute ic is

iα then

cardU

Ccard

ii=α

iC ={considering the

incompatible objects in U only when condition

attribute is ic }

Since the incompatibility among attributes in the

decision-making table can be regarded as independent

Tα can be denoted as

i

m

iT

αα1=

∏= . To achieve the

convenience of calculation, it can be approximated

to:m

Tαα =

The value of α can be estimated according to the

formula above:m

Tαα ≈

In theory the decision-making tables after

dispersion should be compatible namely, 0=Tα .

However, in reality it is perfect to have Tα a

sufficiently small value. The compatibility among the

different attributes in decision-making tables is

differential but the difference can be tolerated

βαα ≤−i

( : tolerance)

algorithm 1 continual data dispersing algorithm

input Decision-making table T= U C D V f

},,{ 1 nxxU = },,{ 1 mccC = ic are all continual

attribute, i=1 2 … m

D={d}is decision-making attribute.

Output: dispersed decision-making table

step

1. Give Tα , δβ , their initial value estimate the

incompatibility value α2. As for i=1 2 … m repeat the following steps:

1) for attribute ic and the initial threshold divide

the data set up into a kind of partition about U

2) group every partition partly and group the partial

cluster and have them contracting to the center of

clusters with contractive gene

3)use formula 1 to calculate the incompatibility

iα of this attribute

4) if βαα ≤− i then i= i+1 else GOTO 5

5) if βαα ≥− i then alter GOTO 1

3.Encode the attribute value after dispersion with 0, 1,

2, and 3….

3.2.2. Data filtering phase [12-13]

Data filtering is to filter primitive information

system to reduce information size. In early research, it

is common to use the concept of proximate quality to

measure data and predict the relationship between

destinations. The knowledge size reveals the resolution

power of knowledge [14]

. The smaller the size, the

higher the resolution, and vice versa. Hence, if the

information size is relatively larger in the information

system and relatively few objects are included in the

equal value class, then the rules that are created by

such information system have better classifying

capabilities. Nevertheless, it cannot guarantee better

classifying prediction capabilities of new object

collection to be classified.

The foreign researchers have proposed that the rule

statistics can be regarded as a new efficient measure

and also devised one simple way of data filtering.

Literature [15]

has demonstrated a data filtering method

based on the information test. Yet, the above data

filtering methods have their weaknesses: Neither

gives concern to the information size of

decision-making system; Rules do not own

satisfying statistical meaning; Neither is able to

preserve the reliable information of the information

system. However, the data filtering method based on

the rough set theory brought out in this article can solve

the mentioned problems.

The data filtering method on the basis of rough set

theory goes as follows:

algorithm 2 data filtering algorithm based on rough

set

Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE

Page 4: [IEEE IEEE International Conference on e-Business Engineering (ICEBE'05) - Beijing, China (2005.10.12-2005.10.18)] IEEE International Conference on e-Business Engineering (ICEBE'05)

input T= U C D V f a C D and C D=

C: Condition attribute D: decision-making attribute.

output TR= U C D V

Rf

Ra C D and

C D= C: Condition attribute D: decision-making

attribute.

Step:

1. For every a C D calculate equal value category

set E({a})

2. For every a C put their attributes in order mark

after ordering 110 ,,, −kaaa 3.Suppose 1,,1,0 110 −=== − kaaa k j=0

jjj aaav == ++ 11 ,

if ajaj aa )(,)( 1+ are decided by D )(DEYi ∈ and

iajajYaa ⊆+ )(,)(

1

then ,1,,122 −=−= ++ kkjj aaaa and unite the

same row

else va j =+1

4. Order j=j+1 if j=k, then output 110 ,,, −kaaa else

GOTO 3

Through the combination of the equal value

category set by D, the 2nd

algorithm in its 3rd

step

improves the statistical meaning of rules and reduces

the quantity of equal value categories without affecting

the reliable information of the information system.

3.2.3. Data Completion

Data collection can be classified into two groups:

complete data collection and incomplete data collection.

We call the collection with lost data incomplete data

collection. Most of the data mining starts from

complete data collection. At present, there are a lot of

ways to deal with the incomplete data. For instance, the

incomplete data can be transformed into the complete

data through data completion (approximate measuring

or transferring of the lost values), the incomplete data

can also acquire the useful rules through the direct

processing.

According to the characteristics of similarity

between the upper and the lower rough theory [16-17]

,

this article has put forth a new algorithm of incomplete

data to estimate the lost attribute values in a rather

precise sense. The algorithm goes as follows:

algorithm 3 incomplete data learning algorithm

based on rough set

input incomplete decision-making table T there are n

objects each object has m attributes, and each object is

one of the c classes (all objects are divided into c

classes)

Output: complete decision-making table C

Step:

1.devide the objects into none-intersecting subclasses

according to category label marked asl

X

2. if the object )(iobj attribute j

A has a certain value

)(i

jv put )(iobj c into the incomplete equal value

category j

A =)(i

jv

if )(iobj attribute jA has lost value then put )(iobj

u into each incomplete equal value category j

A ’

3. order q=1 use q to calculate the amount of lower

proximate attribute whose incomplete data are under

processing

4. calculate the lower approximation of lX of each

incomplete subset B with q attribute’s

})(1,)(

,,1),{()(

)()(

)()()(

i

l

ic

k

l

iii

l

objBkXobjB

XobjnisymbolobjXB

≤≤⊆

∈≤≤=−

)( )(iobjB is the incomplete equal value category set

containing )(iobj and comes from the attribute

subset B. )( )(ic

kobjB is one part of the No. k

incomplete equal value category in )( )(iobjB

5. For every uncertain instance in approximate

incompletion, do as followings:

(1) If )(iobj exists in the only incomplete equal value

category and the No. k value endows )(iobj ’s

uncertain value with attribute value k

Bv

(2) If )(iobj exists in more than one approximate

complete equal value category delay the estimation

of its uncertain value

6. Order q=q+1 repeat 4-6 until q>m

7.If there are lost value unestimated, continue the

following steps or go to step 13

8. p=1 use p to calculate the amount of upper

proximate attributes whose incomplete data are

being processed

9.calculate the incomplete upper approximation of each

lX ’s of subset B which has p attributes as

follows

})(1,)()(

,1),{()(

)()()(

)()(

i

l

ic

kl

ic

k

ii

l

objBkXobjBXobjB

nisymbolobjXB

≤≤⊄≠∩

≤≤=−

φ1

0. For every uncertain instance, do as followings

(1) if )(iobj exists in the only incomplete equal value

category and the No. k incomplete approximate

value in attribute subset B endows )(iobj ’s

uncertain value with attribute value k

Bv

(2) if )(iobj exists in more then approximate

complete equal value category delay the estimation

Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE

Page 5: [IEEE IEEE International Conference on e-Business Engineering (ICEBE'05) - Beijing, China (2005.10.12-2005.10.18)] IEEE International Conference on e-Business Engineering (ICEBE'05)

of its uncertain value

11. Order q=q+1 repeat 8-12 until q>m

12. If )(iobj still exists in the upper approximation of

many equal value categories find the maximum

attribute k

Bv in objects defined in incomplete equal

value category endow k

Bv with the uncertain

value of )(iobj alter )(iobj u to )(iobj c

in the incomplete equal value category of k

Bv and

remove ( )(iobj u) from other attributes of

incomplete equal value category

13. So far all the lost digits are calculated out, and

output the complete data collection.

This algorithm also defines the concept of the upper

and lower approximation, takes full advantage of the

features of rough set, make the decision-making table

with lost value and the completed decision-making

table share the same compatibility, and further make

the data in the decision-making more precise and

practical.

3.3. Designing and implementation of the

mining evaluation layer

This layer is the very core of the commercial data

mining including the rule selection and result

evaluation. The rules selection is required to satisfy the

certain degree of confidence, allow no redundancy, and

can be updated dynamically. The main function of

result evaluation is to output the rules with high credit

to users.

After data preprocessing, the decision-making table

is to undergo the attribute simplification and attribute

value simplification [18-20]

, that is, the simplification of

row and column so as to overcome the vulnerability of

redundancy.

3.3.1. Incremental learning of the rough set theory

In 1994, Shan.N. et al. proposed a new kind of

incremental algorithm that applies to the matrix of the

coordinated decisions. Yet, this kind of method does

have the following vulnerabilities: When there are

many decision-making attributes and many multi

decision-making matrix are required to be calculated, it

will cost much time and space It produces certain

rules only It is unable to deal with discordant data.

In an effort to overcome the above vulnerabilities,

this article designs the filtering algorithm with the

concept of partial different matrix.

Definition Suppose U={ 1x , 2x , 3x , 4x , 5x , 6x }

R is the reference attribute setPM is the partial

different matrix

⎪⎪⎩

⎪⎪⎨

=∈≠≠∉∈

=

)(

),,2,1,(

),,(),(

),,(),(:,

othertimeNull

njiDb

bxbx

axaxRaCa

Mji

ji

p

y ρρρρÇÒ

Using the partial different matrix and rough set

theory this paper puts forward an incremental learning

algorithm based on rough set. Algorithm in detail is as

follows:

algorithm 4 incremental learning algorithm based on

rough set when the initial rule warehouse is built

input decision-making table T= U C D V f

output uncertain rules(measured with degree of

confidence)or certain rules

Step:

1. Divide the table into some equal value category in C

|;)(/|,...,2,1),(/ CindUiCindUEi

=∈2. Divide the table into several decision-making

categories in D.

|;)(/|,...,2,1),(/ DindUjDindUXj

=∈3. Calculate partial different matrix according to the

formula P

ijM

4. Calculate the relative different function ),( CEfi

belonging to each equal value category

5. Deduce the decision-making rules according to

),( CEfi

If jiXE ⊆ then ),(),( DXDesCEDes

ji→ else

),(),( DXDesCEDesji

α→

i

ji

E

XE ∩=α (α is the degree of confidence)

Thus, the initial rule warehouse has been formed.

When new data are put into the decision-making table,

we can update the initial rule warehouse through

several steps according to the relationship between the

new data and the objects in the original

decision-making system.

algorithm 5 incremental learning algorithm based on

rough set when the rule warehouse is under dynamic

modification

input New data object R

output Rule warehouse after updating

Step:

1. Scan the whole original decision-making table judge

whether R is compatible to the original data in

decision-making system.

2. If not compatible then GOTO 3

3. Suppose iER ∈ distinguish iE

Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE

Page 6: [IEEE IEEE International Conference on e-Business Engineering (ICEBE'05) - Beijing, China (2005.10.12-2005.10.18)] IEEE International Conference on e-Business Engineering (ICEBE'05)

1) when ji

XE ⊄ and Φ≠∩ ji XE alter degree

of confidence of ),(),( DXDesCEDes ji

α→ ’s

1

1

+

+∩=

i

ji

E

XEα 2

For other jk ≠ then Φ≠∩ ki XE and alter

the degree of confidence of the corresponding

rule ),(),( DXDesCEDes ki

α→

1

1

+

+∩=

i

ki

E

XEα 3

2) when ki XE ⊆ rule warehouse remains

invariable

3) if the condition attribute value of R is in accordance with the condition attribute value of a

certain object in the decision-making system, but

incompatible with the decision-making attribute

value then produce the uncertain rules and according to

formula 2 or 3 and calculate all uncertain rule’s

degree of confidence after changing

else if neither the condition attribute value nor the

decision-making attribute value of R is in

accordance with any arbitrary objects in decision-making system, or when and only when

the decision-making attribute is identical

then produce new equal value category, add a

new row to the first row of the original partial different matrix, add a new column in

front of the first column, and make it a new

partial different matrix. Then, using the fifth

step method while building the original rule warehouse to get new rules and update the

rule warehouse.

If we apply the incremental learning algorithm to

decision-making table, we get rules or knowledge that satisfies a certain degree of confidence and can be

updated dynamically.

3.3.2. Result evaluation [21-22]

The result evaluation is an important step in the

commercial data mining. The result evaluation is

conducted by the users and outputs the satisfying knowledge and decisive support information. If the

users find it hard to make a judgment, it is possible to

have the system itself fulfill the evaluation in place of

the users. The result evaluation means users with knowledge

and experience about a certain field making judgments

on the newly found knowledge. If the users consider

the knowledge interesting and satisfying and the results are relatively precise, then the knowledge would be

sent to the knowledge warehouse and be reported to

form a part. And if the users are not satisfied with the

results, then it is needed to mine the data again till the satisfying results are finally found.

The machine evaluation has the following steps:

first, input the interest degree of the users into the

system. Then, compare the confidence degree of the rules and knowledge found in data mining with the

interest degree of the users. And if the confidence

degree of the rules is higher than the interest degree of

the users, then it is to say that the users are interested in the rules and these rules should be preserved. If not,

then they should be deleted. Finally, use the data that

are obtained through the machine sample to establish

the decision-making table in the commercial data warehouse. And then apply the preserved rules to the

decision -making table and if they are compatible, the

rules should be put in the knowledge warehouse; if not,

delete them from the knowledge warehouse.

3.4. An instance

So far we have established framework of

commercial data mining based on the rough set and presented some novel algorithms. In proving the

efficiency of above algorithms, we take an instance in

the following. In this instance, we get data from a firm

that deals with retailing. According to above framework, the precondition of data mining is in

possession of a commercial database. Due to the length

of article, we omit detailed databases design. We get

data from firm’s commercial database in order to analyze customs relation. We use data mining to

confirm which customers purchase high cosmetic.

After data filtering and decision table creating, we

can obtain following decision-making table as table 1. Table 1. Original decision-making table

U age sex income

Profession

(blue or

white

collar)

credit

Purchasing

high

cosmetic

1e 23 man 500 blue bad no

2e 20 woman 700 white better yes

3e 33 woman 1000 white better yes

4e 35 man 3000 blue good yes

5e 38 woman 500 blue good no

6e 42 man 300 blue bad no

7e 44 man 230 white bad no

8e 47 woman 150 blue good no

9e 20 man 1500 white bad no

Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE

Page 7: [IEEE IEEE International Conference on e-Business Engineering (ICEBE'05) - Beijing, China (2005.10.12-2005.10.18)] IEEE International Conference on e-Business Engineering (ICEBE'05)

10e 25 unknown 300 blue bad yes

11e 50 man 1650 white bad yes

12e 28 woman unknown white good yes

13e 33 woman 1350 blue better yes

14e 61 man 1680 blue better no

15e 22 unknown 1000 white good yes

16e 24 man 1500 blue bad no

17e 33 woman 600 blue good no

18e 35 woman unknown white bad no

19e 44 unknown 500 white good yes

20e 49 man 230 blue bad no

In the above table, condition attributes are sex, profession, credit, age and income, and decision

attribute is purchasing high cosmetic. Discrete

attributes are sex, profession, and credit. Continuity

attributes are age and income. For easily computing, we note attribute value of sex ( man, woman) as (0,1),

(blue, white) as (0,1), (bad, good, better) as (0,1,2),

(yes, no) as (1,0). For the same reason, we use A, B, C,

D, E to present age, sex, income (monthly pay), profession. The decision attribute is marked as F. The

set of attribute is marked as S. The set of decision

attribute is marked as P. So we can conclude the

following equations. S ={A,B,C,D,E}, P={F}

Then we adopt algorithm 1 to deal with continuity

attributes. So we get table 2.

Table 2.Object decision-making table

U A B C D E F

1e 0 0 0 0 0 0

2e 0 1 1 1 2 1

3e 1 1 1 1 2 1

4e 1 0 2 0 1 1

5e 2 1 0 0 1 0

6e 2 0 0 0 0 0

7e 2 0 0 1 0 0

8e 2 1 0 0 1 0

9e 0 0 2 1 0 0

10e 0 unknown 0 0 0 1

11e 2 0 2 1 0 1

12e 1 1 unknown 1 1 1

13e 1 1 2 0 2 0

14e 3 0 2 0 2 0

15e 0 unknown 1 1 0 1

16e 0 0 2 0 1 0

17e 1 1 1 0 1 0

18e 1 1 unknown 1 0 0

19e 2 unknown 0 1 1 1

20e 2 0 0 0 0 0

We find some lost data whose value is unknown in

table 2. This decision-making table is an example of incomplete data set. To aim at this question, we adopt

algorithm 3 in the decision table. We obtain a complete

decision table. The values of attribute B of object 10,

object 15 and object 19 are 0,0,0. The value of attribute C of object 18 is 0. Object 5 and object 8 is the same,

so we delete object 5. We will get a decision table with

19 objects.

So far, we finish all tasks in data pre-processing, the following step is mining evaluation. In this layer, we

will take some key operation to assure rules or

knowledge that are high precise and dynamic. We use

algorithm 4 to simplify attributes and their values. We can obtain table 3.

Table 3. Simplified decision-making table

U A B C D F

1e 0 0 0 0 0

2e 0 1 1 1 1

3e 1 0 2 0 1

4e 2 1 0 0 0

5e 2 0 0 0 0

6e 0 0 2 1 0

7e 0 1 0 0 1

8e 2 0 2 1 1

9e 1 1 2 0 0

10e 3 0 2 0 0

11e 0 0 2 0 0

12e 1 1 1 0 0

13e 1 1 0 1 0

14e 2 0 0 1 1

Then we use algorithm 5 for extracting rules and incremental learning. We suppose the decision has 13

objects, and the 14th object is added in learning step. So

we obtain 13 equal value categories, denoted in the

following:

}{ 11 eE = , }{ 22 eE = , }{ 33 eE = , }{ 44 eE =}{ 55 eE = , }{ 66 eE = , }{ 77 eE = , }{ 88 eE =}{ 99 eE = , }{ 1010 eE = , }{ 1111 eE = , }{ 1212 eE =}{ 1313 eE =

Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE

Page 8: [IEEE IEEE International Conference on e-Business Engineering (ICEBE'05) - Beijing, China (2005.10.12-2005.10.18)] IEEE International Conference on e-Business Engineering (ICEBE'05)

After we carry out every step of algorithm 5,we will

get useful rules. We adopt knowledge representation of production rule to represent rules. The form is

following.

if (A is 0) and(B is 0) then F is 0

if (A is 1) and (B is 1) then F is 0

if (A is 1) and (C is 2) then F is 1

We replace A,B,C,D with its mean in above rules. So

we get complete decision rules. For example, if the value of attribute sex is man and age is less than 25,

they don’t purchase high cosmetic, and so on.

We add the 14th object to the decision table.

Because the condition attributes of new object is different with other’s objects’, new equal value

categories are produced. We only need add a column

and a row to obtain a new rule. Thus, we can update

rules warehouse easily. Through above instance, we test the efficiency and

feasibility of algorithms. We can adopt results to

commercial activity. To this instance, commercial firm

provide relative customizing service according to the result.

4.Conclusion

This article has applied the rough set theory to the process of the commercial data mining and has

proposed several algorithms based upon the rough set

theory to solve the current problems and vulnerabilities.

The article has made some academic achievements as following:

1.It presents an algorithm of data completion based

on rough set .this algorithm uses the upper and the

lower rough theory and can efficiently deal with commerce data loss.

2. It put forward an incremental-learning algorithm

based on rough set theory. This algorithm uses

confidence in order to make itself flexibility and reduce computing complexity.

3.According to characters of commercial data, It

presents a framework of commercial data Mining based

on the rough set.

References

[1] Wei Yanwu. Studies on the Application of Data Mining

Techniques to the Management of Customers’ Relationship

[D]. Wu Han Wu Han Science and Engineering University

2002

[2] Wang Huiming. Studies on Commercial Information

System[D]. Tianjin Tianjin Industry University 2001

[3] Li Jie. Application of Data Warehouse System in

Commercial Franchise Enterprises[D].Beijing Beijing

Industry University 2001

[4] Zhao Yuyong, Wu Yongming. Studies on the Application

of Data Warehouse Techniques in Decision-making

Supporting System[J]. Application of Computer System

1999 (3) 29-32.

[5] Liu Tongming, et al. Data Mining Techniques and the

Application[M]. Beijing: National Defense Industry

University. 2001, 40-85

[6] J Christopher, K Philip Chan.Systems For Knowledge

Discovery In Databases IEEE Trans [J] On Knowledge and

Data Engineering 1993 5(6) 903-913

[7] Wang Bingfeng, Liu Lianzhong. Connected Analyzing

and Processing and Its Application in Management of

Information System[J]. Application of Computer Science.

2001 18(1) 71-74,78.

[8] Yan Weimin, Wu Weimin. Data Structure. Beijing: Qing

hua University Press 1996,62-80

[9] Miao Duoqian. Dispersion Methods of Continual

Attributes of Rough Set [J]. Automation Journal. 2001,27(3)

296-302

[10] Yu Jinlong, Li Xiaohong, Sun lixin. Overall Dispersion

of Continual Attributes[J]. Journal of Ha Erbing Industry

University. 2000 32(3) 48-53

[11] Zhao Jun, et al. New Algorithm of Data Dispersion

Based on Rough Set Theory[J]. Journal of Chong Qing

University2002 25(3) 18-21

[12] Han Zhenxiang, et al. A Summary of Rough Set Theory

and Its Application[J]. Control Theory and Application.

1999 16(2) 153-157

[13] Wang Jue, Miao Duoqian. “Data Contraction” Based on

Rough Set Theory[J]. Computer Journal. 1998, 21(5) 393

400

[14] Miao Duoqian, Fan Shidong. Knowledge Size

Calculation and Its Application[J]. System Engineering

Theory and Practice. 2002 22(1) 48-56

[15] Wang Mingwen. A Data Filtering Method based on

Information Measurement[J]. Journal of Nan Chang

Hydraulic College. 2002 21(2) 1-6

[16] Zhao Weidong, et al. Data Mining Under Incomplete

Information of Section Attributes [J]. Application of System

Engineering Theory 2001 10(2) 136-147

[17] Liu Yezheng, Yang Shanlin. Studies on Estimation of

Null Based on Rough Set Theory [J]. Computer Engineering.

2001 27(10) 41-42

[18] Miao Duoqian, Hu Guirong. A Enlightening Algorithm

of Knowledge Simplification[J] Computer Study and

Development.1999 36(6) 681-684

[19] Mohua Banerjee Sushmita Sankar K.Rough fuzzy

MLP :Knowledge encoding and classification[J] IEEE

Transactions on neural networks 1998 9(6) 1203-1206

[20] J.R.Quinlan.Induction of Decision Treees Machine

learning 1986(1) 81-106

[21] Wang Jun. Studies on the Discovery of Database

Knowledge[D]. Beijing: Institute of Calculation of China

Academy of Sciences. 1997

[22] Chen Wenwei. Decision Supporting System and Its

Development[M]. Beijing: Qing Hua University Press.

1994,90-97

Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE


Top Related