[ieee ieee international conference on e-business engineering (icebe'05) - beijing, china...
TRANSCRIPT
Design and Implementation of Commerce Data Mining System Based on
Rough Set Theory*
*
Support by the National Natural Science Foundation of China. Key Project No. 70371054
Xiang Yang,
Tongji University,
P.R.China, 200092
Wu Weiying,
Huzhou Teachers College,
P.R.China, 313000
Mao Hairong, Song Qingwei
Tongji University,
P.R.China, 200092
Abstract On the basis of analyzing characters of commercial
data and Rough Set, this paper proposes a data mining
system using Rough Set, and presents detailed design
and implementation. The novel commercial data
mining system presented in this paper include three
parts: data preparation, data preprocessing, mining and
evaluation. During the design and implementation of
three levels, this paper presents several novel
algorithms using rough set. These algorithms can
accelerate data mining process and improve quality of
data mining. Finally, the paper tests the result of
research and algorithm through an instance.
1.Introduction
Commercial data mining possesses the functions of
automatic trend prediction and detecting the unfound
modes. It is easy to conduct, and its analysis results are
reliable and transparent. Businessmen are attaching
more and more attention to the commercial data mining
system and its functions [1-3]
.
However, the present data mining system still has
many vulnerabilities [4-6]
, primarily as follows: 1.No
better methods are so far devised to promptly delete or
correct noise data and to deal with empty data; 2. The
algorithmic process of data mining is time-consuming
due to the mere concerns to the complications of the
algorithm itself and the lack of concerns to the
environmental resources for the working hardware; 3.
Research on the accuracy of the freely selected rules is
incomplete. 4. Neither knowledge nor regulations are
well protected and updated promptly.
The advantages of the rough set are as follows:
It can remove the redundant and useless information by
discovering the relations among the data, and simplify
the expressional space of the input information; The
results gained through the rough set are presented in
the form of rules and are thus easy to understand;
The rough set can reduce the information granule so as
to improve the statistics meaning of the rules. it can
handle the coordinated and uncoordinated data[7]
.
Commercial data include customer data, product
directory, pricing data, order form state, etc. Therefore,
targeting at the characteristics of commercial data, this
article form and complete a commercial data mining
system based on the rough set theory combined with
data mining techniques.
2. Systemized framework of commercial
data mining
Figure 1 is illustrating the systemized framework
devised through the combination of the rough set
theory and data mining technique.
In Figure 1, the commercial mining system based
on the rough set is composed of three layers. The 1st
layer is data preparation, which defines the resources of
the commercial data, chooses suitable data resources,
and identifies data, organizes data and builds the
decision-making tables. The 2nd
layer is data
pre-processing, which is to disperse all the continuous
attributes in the decision-making tables and then
infiltrates data and analyzes the completeness of the
data through the rough set. The 3rd
layer is mining
evaluation. It is to simplify the attribute and attribute
value, and then acquire new rules and more updated
knowledge through incremental learning, finally
evaluate and give feedback on the obtained results.
3.Designing and Realizing the System
3.1. Designing and realizing the preparation
level
Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE
3.1.1. Building up the commercial data warehouse
Nowadays with commercial competitions getting
more and more severe, a large amount of data are
accumulated and distributed in various departments,
platforms and database of enterprises. Commercial data
are affected by many uncertain factors and incomplete
information. This type of multiple distributions of data
sources makes it necessary to build up a commercial
data warehouse to improve mining efficiency and
accuracy. The following are the definitions of some
data required by the construction of commercial data
warehouse.
1) Data about purchase of commodities: including
providers’ names, providers’ codes, codes of the
purchase departments, tax rate, payment conditions etc;
2) Data about storing of commodities: including codes
of warehouses, codes of the stored commodities, the
quantity of the stored commodities, unit price etc;
3) Data about sales of commodities: including order
date, order number, conditions of payment, sales
departments, sales clerks, names of commodities,
quantity, unit, unit price, tax rate, delivery date, number
of the delivery bill, the commodities warehouses,
delivery address, quantity of returned goods etc;
4) Data about accountings of commodities: including
fixed assets, profits, debts, circulating funds etc;
5) Data about customers: including names of customers,
codes of customers, number of bank account etc.
3.1.2. Data deletion
Deleting data with noise is a rather complicated
process that affects a lot of other aspects. The whole
deleting process can be divided into the following
steps:
1) Check on the spelling errors;
2) Delete the repeated records, make up for the
incomplete records and set the uncoordinated
records;
3) Use the testing function to check on the data;
4) Repeat the above steps a number of times
according to the test results.
3.2. Designing and implementation of data
preprocessing
Data preprocessing in this article is mainly divided
into three phases: data dispersion, data filtering and
data completion.
3.2.1. Data splitting phase
Due to the fact that the rough set theory merely
deals with the splitting nature and is unable to handle
Draw correct, credible, uniform data
user User interface
Inner data warehouse
Data clean
Data prepareand define
Data filter
Data comsummate
Continual attribute disperse
Attribute value and simplify
Rules draw and simplify
Commerce warehouse
Supply decision-support for user
Outer data warehouse
Other data
Table build Target data base
Data
Prepare
layer
Result evaluate
Data
Predispose
Layer
Mining
Evaluation
Layer
Knowledge filter
Algorithm warehouse
Unsatisfactory Satisfactory
Figure 1. Framework of commercial data mining based on the rough set
Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE
the continuity nature, it is necessary to go on with the
research on the splitting of the continuity nature.
Nowadays, there are many methods dealing with
continuity attributes [9-11]
. Among them, some
algorithms proposed in literature [9]
.are relatively
effective and representative. This algorithm is
commonly applicable based upon the feedback
information of compatibility of the decision-making
table, and it doesn’t need field knowledge while fully
remaining the information of the original
decision-making table. However, this algorithm does
not seem to work well on the occasion when the
attribute values are widely spread and when the cutting
points of the attribute values are relatively focused on
one extreme of the section.
Targeted at the characteristics of commercial data,
this article has improved the dispersing methods of
continuity values brought forth by some researchers so
that the algorithm can process the data whose
continuity attribute is widely spread, using the
compatibility of the decision-making table and using
the contractive gene to improve the preciseness of the
result. The algorithm goes as follows:
Suppose T= U C D V f is a decision-making
table },,{1 n
xxU = },,{1 m
ccC = .
If the incompatibility of attribute ic is
iα then
cardU
Ccard
ii=α
iC ={considering the
incompatible objects in U only when condition
attribute is ic }
Since the incompatibility among attributes in the
decision-making table can be regarded as independent
Tα can be denoted as
i
m
iT
αα1=
∏= . To achieve the
convenience of calculation, it can be approximated
to:m
Tαα =
The value of α can be estimated according to the
formula above:m
Tαα ≈
In theory the decision-making tables after
dispersion should be compatible namely, 0=Tα .
However, in reality it is perfect to have Tα a
sufficiently small value. The compatibility among the
different attributes in decision-making tables is
differential but the difference can be tolerated
βαα ≤−i
( : tolerance)
algorithm 1 continual data dispersing algorithm
input Decision-making table T= U C D V f
},,{ 1 nxxU = },,{ 1 mccC = ic are all continual
attribute, i=1 2 … m
D={d}is decision-making attribute.
Output: dispersed decision-making table
step
1. Give Tα , δβ , their initial value estimate the
incompatibility value α2. As for i=1 2 … m repeat the following steps:
1) for attribute ic and the initial threshold divide
the data set up into a kind of partition about U
2) group every partition partly and group the partial
cluster and have them contracting to the center of
clusters with contractive gene
3)use formula 1 to calculate the incompatibility
iα of this attribute
4) if βαα ≤− i then i= i+1 else GOTO 5
5) if βαα ≥− i then alter GOTO 1
3.Encode the attribute value after dispersion with 0, 1,
2, and 3….
3.2.2. Data filtering phase [12-13]
Data filtering is to filter primitive information
system to reduce information size. In early research, it
is common to use the concept of proximate quality to
measure data and predict the relationship between
destinations. The knowledge size reveals the resolution
power of knowledge [14]
. The smaller the size, the
higher the resolution, and vice versa. Hence, if the
information size is relatively larger in the information
system and relatively few objects are included in the
equal value class, then the rules that are created by
such information system have better classifying
capabilities. Nevertheless, it cannot guarantee better
classifying prediction capabilities of new object
collection to be classified.
The foreign researchers have proposed that the rule
statistics can be regarded as a new efficient measure
and also devised one simple way of data filtering.
Literature [15]
has demonstrated a data filtering method
based on the information test. Yet, the above data
filtering methods have their weaknesses: Neither
gives concern to the information size of
decision-making system; Rules do not own
satisfying statistical meaning; Neither is able to
preserve the reliable information of the information
system. However, the data filtering method based on
the rough set theory brought out in this article can solve
the mentioned problems.
The data filtering method on the basis of rough set
theory goes as follows:
algorithm 2 data filtering algorithm based on rough
set
Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE
input T= U C D V f a C D and C D=
C: Condition attribute D: decision-making attribute.
output TR= U C D V
Rf
Ra C D and
C D= C: Condition attribute D: decision-making
attribute.
Step:
1. For every a C D calculate equal value category
set E({a})
2. For every a C put their attributes in order mark
after ordering 110 ,,, −kaaa 3.Suppose 1,,1,0 110 −=== − kaaa k j=0
jjj aaav == ++ 11 ,
if ajaj aa )(,)( 1+ are decided by D )(DEYi ∈ and
iajajYaa ⊆+ )(,)(
1
then ,1,,122 −=−= ++ kkjj aaaa and unite the
same row
else va j =+1
4. Order j=j+1 if j=k, then output 110 ,,, −kaaa else
GOTO 3
Through the combination of the equal value
category set by D, the 2nd
algorithm in its 3rd
step
improves the statistical meaning of rules and reduces
the quantity of equal value categories without affecting
the reliable information of the information system.
3.2.3. Data Completion
Data collection can be classified into two groups:
complete data collection and incomplete data collection.
We call the collection with lost data incomplete data
collection. Most of the data mining starts from
complete data collection. At present, there are a lot of
ways to deal with the incomplete data. For instance, the
incomplete data can be transformed into the complete
data through data completion (approximate measuring
or transferring of the lost values), the incomplete data
can also acquire the useful rules through the direct
processing.
According to the characteristics of similarity
between the upper and the lower rough theory [16-17]
,
this article has put forth a new algorithm of incomplete
data to estimate the lost attribute values in a rather
precise sense. The algorithm goes as follows:
algorithm 3 incomplete data learning algorithm
based on rough set
input incomplete decision-making table T there are n
objects each object has m attributes, and each object is
one of the c classes (all objects are divided into c
classes)
Output: complete decision-making table C
Step:
1.devide the objects into none-intersecting subclasses
according to category label marked asl
X
2. if the object )(iobj attribute j
A has a certain value
)(i
jv put )(iobj c into the incomplete equal value
category j
A =)(i
jv
if )(iobj attribute jA has lost value then put )(iobj
u into each incomplete equal value category j
A ’
3. order q=1 use q to calculate the amount of lower
proximate attribute whose incomplete data are under
processing
4. calculate the lower approximation of lX of each
incomplete subset B with q attribute’s
})(1,)(
,,1),{()(
)()(
)()()(
i
l
ic
k
l
iii
l
objBkXobjB
XobjnisymbolobjXB
≤≤⊆
∈≤≤=−
)( )(iobjB is the incomplete equal value category set
containing )(iobj and comes from the attribute
subset B. )( )(ic
kobjB is one part of the No. k
incomplete equal value category in )( )(iobjB
5. For every uncertain instance in approximate
incompletion, do as followings:
(1) If )(iobj exists in the only incomplete equal value
category and the No. k value endows )(iobj ’s
uncertain value with attribute value k
Bv
(2) If )(iobj exists in more than one approximate
complete equal value category delay the estimation
of its uncertain value
6. Order q=q+1 repeat 4-6 until q>m
7.If there are lost value unestimated, continue the
following steps or go to step 13
8. p=1 use p to calculate the amount of upper
proximate attributes whose incomplete data are
being processed
9.calculate the incomplete upper approximation of each
lX ’s of subset B which has p attributes as
follows
})(1,)()(
,1),{()(
)()()(
)()(
i
l
ic
kl
ic
k
ii
l
objBkXobjBXobjB
nisymbolobjXB
≤≤⊄≠∩
≤≤=−
φ1
0. For every uncertain instance, do as followings
(1) if )(iobj exists in the only incomplete equal value
category and the No. k incomplete approximate
value in attribute subset B endows )(iobj ’s
uncertain value with attribute value k
Bv
(2) if )(iobj exists in more then approximate
complete equal value category delay the estimation
Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE
of its uncertain value
11. Order q=q+1 repeat 8-12 until q>m
12. If )(iobj still exists in the upper approximation of
many equal value categories find the maximum
attribute k
Bv in objects defined in incomplete equal
value category endow k
Bv with the uncertain
value of )(iobj alter )(iobj u to )(iobj c
in the incomplete equal value category of k
Bv and
remove ( )(iobj u) from other attributes of
incomplete equal value category
13. So far all the lost digits are calculated out, and
output the complete data collection.
This algorithm also defines the concept of the upper
and lower approximation, takes full advantage of the
features of rough set, make the decision-making table
with lost value and the completed decision-making
table share the same compatibility, and further make
the data in the decision-making more precise and
practical.
3.3. Designing and implementation of the
mining evaluation layer
This layer is the very core of the commercial data
mining including the rule selection and result
evaluation. The rules selection is required to satisfy the
certain degree of confidence, allow no redundancy, and
can be updated dynamically. The main function of
result evaluation is to output the rules with high credit
to users.
After data preprocessing, the decision-making table
is to undergo the attribute simplification and attribute
value simplification [18-20]
, that is, the simplification of
row and column so as to overcome the vulnerability of
redundancy.
3.3.1. Incremental learning of the rough set theory
In 1994, Shan.N. et al. proposed a new kind of
incremental algorithm that applies to the matrix of the
coordinated decisions. Yet, this kind of method does
have the following vulnerabilities: When there are
many decision-making attributes and many multi
decision-making matrix are required to be calculated, it
will cost much time and space It produces certain
rules only It is unable to deal with discordant data.
In an effort to overcome the above vulnerabilities,
this article designs the filtering algorithm with the
concept of partial different matrix.
Definition Suppose U={ 1x , 2x , 3x , 4x , 5x , 6x }
R is the reference attribute setPM is the partial
different matrix
⎪⎪⎩
⎪⎪⎨
⎧
=∈≠≠∉∈
=
)(
),,2,1,(
),,(),(
),,(),(:,
othertimeNull
njiDb
bxbx
axaxRaCa
Mji
ji
p
y ρρρρÇÒ
Using the partial different matrix and rough set
theory this paper puts forward an incremental learning
algorithm based on rough set. Algorithm in detail is as
follows:
algorithm 4 incremental learning algorithm based on
rough set when the initial rule warehouse is built
input decision-making table T= U C D V f
output uncertain rules(measured with degree of
confidence)or certain rules
Step:
1. Divide the table into some equal value category in C
|;)(/|,...,2,1),(/ CindUiCindUEi
=∈2. Divide the table into several decision-making
categories in D.
|;)(/|,...,2,1),(/ DindUjDindUXj
=∈3. Calculate partial different matrix according to the
formula P
ijM
4. Calculate the relative different function ),( CEfi
belonging to each equal value category
5. Deduce the decision-making rules according to
),( CEfi
If jiXE ⊆ then ),(),( DXDesCEDes
ji→ else
),(),( DXDesCEDesji
α→
i
ji
E
XE ∩=α (α is the degree of confidence)
Thus, the initial rule warehouse has been formed.
When new data are put into the decision-making table,
we can update the initial rule warehouse through
several steps according to the relationship between the
new data and the objects in the original
decision-making system.
algorithm 5 incremental learning algorithm based on
rough set when the rule warehouse is under dynamic
modification
input New data object R
output Rule warehouse after updating
Step:
1. Scan the whole original decision-making table judge
whether R is compatible to the original data in
decision-making system.
2. If not compatible then GOTO 3
3. Suppose iER ∈ distinguish iE
Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE
1) when ji
XE ⊄ and Φ≠∩ ji XE alter degree
of confidence of ),(),( DXDesCEDes ji
α→ ’s
1
1
+
+∩=
i
ji
E
XEα 2
For other jk ≠ then Φ≠∩ ki XE and alter
the degree of confidence of the corresponding
rule ),(),( DXDesCEDes ki
α→
1
1
+
+∩=
i
ki
E
XEα 3
2) when ki XE ⊆ rule warehouse remains
invariable
3) if the condition attribute value of R is in accordance with the condition attribute value of a
certain object in the decision-making system, but
incompatible with the decision-making attribute
value then produce the uncertain rules and according to
formula 2 or 3 and calculate all uncertain rule’s
degree of confidence after changing
else if neither the condition attribute value nor the
decision-making attribute value of R is in
accordance with any arbitrary objects in decision-making system, or when and only when
the decision-making attribute is identical
then produce new equal value category, add a
new row to the first row of the original partial different matrix, add a new column in
front of the first column, and make it a new
partial different matrix. Then, using the fifth
step method while building the original rule warehouse to get new rules and update the
rule warehouse.
If we apply the incremental learning algorithm to
decision-making table, we get rules or knowledge that satisfies a certain degree of confidence and can be
updated dynamically.
3.3.2. Result evaluation [21-22]
The result evaluation is an important step in the
commercial data mining. The result evaluation is
conducted by the users and outputs the satisfying knowledge and decisive support information. If the
users find it hard to make a judgment, it is possible to
have the system itself fulfill the evaluation in place of
the users. The result evaluation means users with knowledge
and experience about a certain field making judgments
on the newly found knowledge. If the users consider
the knowledge interesting and satisfying and the results are relatively precise, then the knowledge would be
sent to the knowledge warehouse and be reported to
form a part. And if the users are not satisfied with the
results, then it is needed to mine the data again till the satisfying results are finally found.
The machine evaluation has the following steps:
first, input the interest degree of the users into the
system. Then, compare the confidence degree of the rules and knowledge found in data mining with the
interest degree of the users. And if the confidence
degree of the rules is higher than the interest degree of
the users, then it is to say that the users are interested in the rules and these rules should be preserved. If not,
then they should be deleted. Finally, use the data that
are obtained through the machine sample to establish
the decision-making table in the commercial data warehouse. And then apply the preserved rules to the
decision -making table and if they are compatible, the
rules should be put in the knowledge warehouse; if not,
delete them from the knowledge warehouse.
3.4. An instance
So far we have established framework of
commercial data mining based on the rough set and presented some novel algorithms. In proving the
efficiency of above algorithms, we take an instance in
the following. In this instance, we get data from a firm
that deals with retailing. According to above framework, the precondition of data mining is in
possession of a commercial database. Due to the length
of article, we omit detailed databases design. We get
data from firm’s commercial database in order to analyze customs relation. We use data mining to
confirm which customers purchase high cosmetic.
After data filtering and decision table creating, we
can obtain following decision-making table as table 1. Table 1. Original decision-making table
U age sex income
Profession
(blue or
white
collar)
credit
Purchasing
high
cosmetic
1e 23 man 500 blue bad no
2e 20 woman 700 white better yes
3e 33 woman 1000 white better yes
4e 35 man 3000 blue good yes
5e 38 woman 500 blue good no
6e 42 man 300 blue bad no
7e 44 man 230 white bad no
8e 47 woman 150 blue good no
9e 20 man 1500 white bad no
Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE
10e 25 unknown 300 blue bad yes
11e 50 man 1650 white bad yes
12e 28 woman unknown white good yes
13e 33 woman 1350 blue better yes
14e 61 man 1680 blue better no
15e 22 unknown 1000 white good yes
16e 24 man 1500 blue bad no
17e 33 woman 600 blue good no
18e 35 woman unknown white bad no
19e 44 unknown 500 white good yes
20e 49 man 230 blue bad no
In the above table, condition attributes are sex, profession, credit, age and income, and decision
attribute is purchasing high cosmetic. Discrete
attributes are sex, profession, and credit. Continuity
attributes are age and income. For easily computing, we note attribute value of sex ( man, woman) as (0,1),
(blue, white) as (0,1), (bad, good, better) as (0,1,2),
(yes, no) as (1,0). For the same reason, we use A, B, C,
D, E to present age, sex, income (monthly pay), profession. The decision attribute is marked as F. The
set of attribute is marked as S. The set of decision
attribute is marked as P. So we can conclude the
following equations. S ={A,B,C,D,E}, P={F}
Then we adopt algorithm 1 to deal with continuity
attributes. So we get table 2.
Table 2.Object decision-making table
U A B C D E F
1e 0 0 0 0 0 0
2e 0 1 1 1 2 1
3e 1 1 1 1 2 1
4e 1 0 2 0 1 1
5e 2 1 0 0 1 0
6e 2 0 0 0 0 0
7e 2 0 0 1 0 0
8e 2 1 0 0 1 0
9e 0 0 2 1 0 0
10e 0 unknown 0 0 0 1
11e 2 0 2 1 0 1
12e 1 1 unknown 1 1 1
13e 1 1 2 0 2 0
14e 3 0 2 0 2 0
15e 0 unknown 1 1 0 1
16e 0 0 2 0 1 0
17e 1 1 1 0 1 0
18e 1 1 unknown 1 0 0
19e 2 unknown 0 1 1 1
20e 2 0 0 0 0 0
We find some lost data whose value is unknown in
table 2. This decision-making table is an example of incomplete data set. To aim at this question, we adopt
algorithm 3 in the decision table. We obtain a complete
decision table. The values of attribute B of object 10,
object 15 and object 19 are 0,0,0. The value of attribute C of object 18 is 0. Object 5 and object 8 is the same,
so we delete object 5. We will get a decision table with
19 objects.
So far, we finish all tasks in data pre-processing, the following step is mining evaluation. In this layer, we
will take some key operation to assure rules or
knowledge that are high precise and dynamic. We use
algorithm 4 to simplify attributes and their values. We can obtain table 3.
Table 3. Simplified decision-making table
U A B C D F
1e 0 0 0 0 0
2e 0 1 1 1 1
3e 1 0 2 0 1
4e 2 1 0 0 0
5e 2 0 0 0 0
6e 0 0 2 1 0
7e 0 1 0 0 1
8e 2 0 2 1 1
9e 1 1 2 0 0
10e 3 0 2 0 0
11e 0 0 2 0 0
12e 1 1 1 0 0
13e 1 1 0 1 0
14e 2 0 0 1 1
Then we use algorithm 5 for extracting rules and incremental learning. We suppose the decision has 13
objects, and the 14th object is added in learning step. So
we obtain 13 equal value categories, denoted in the
following:
}{ 11 eE = , }{ 22 eE = , }{ 33 eE = , }{ 44 eE =}{ 55 eE = , }{ 66 eE = , }{ 77 eE = , }{ 88 eE =}{ 99 eE = , }{ 1010 eE = , }{ 1111 eE = , }{ 1212 eE =}{ 1313 eE =
Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE
After we carry out every step of algorithm 5,we will
get useful rules. We adopt knowledge representation of production rule to represent rules. The form is
following.
if (A is 0) and(B is 0) then F is 0
if (A is 1) and (B is 1) then F is 0
if (A is 1) and (C is 2) then F is 1
We replace A,B,C,D with its mean in above rules. So
we get complete decision rules. For example, if the value of attribute sex is man and age is less than 25,
they don’t purchase high cosmetic, and so on.
We add the 14th object to the decision table.
Because the condition attributes of new object is different with other’s objects’, new equal value
categories are produced. We only need add a column
and a row to obtain a new rule. Thus, we can update
rules warehouse easily. Through above instance, we test the efficiency and
feasibility of algorithms. We can adopt results to
commercial activity. To this instance, commercial firm
provide relative customizing service according to the result.
4.Conclusion
This article has applied the rough set theory to the process of the commercial data mining and has
proposed several algorithms based upon the rough set
theory to solve the current problems and vulnerabilities.
The article has made some academic achievements as following:
1.It presents an algorithm of data completion based
on rough set .this algorithm uses the upper and the
lower rough theory and can efficiently deal with commerce data loss.
2. It put forward an incremental-learning algorithm
based on rough set theory. This algorithm uses
confidence in order to make itself flexibility and reduce computing complexity.
3.According to characters of commercial data, It
presents a framework of commercial data Mining based
on the rough set.
References
[1] Wei Yanwu. Studies on the Application of Data Mining
Techniques to the Management of Customers’ Relationship
[D]. Wu Han Wu Han Science and Engineering University
2002
[2] Wang Huiming. Studies on Commercial Information
System[D]. Tianjin Tianjin Industry University 2001
[3] Li Jie. Application of Data Warehouse System in
Commercial Franchise Enterprises[D].Beijing Beijing
Industry University 2001
[4] Zhao Yuyong, Wu Yongming. Studies on the Application
of Data Warehouse Techniques in Decision-making
Supporting System[J]. Application of Computer System
1999 (3) 29-32.
[5] Liu Tongming, et al. Data Mining Techniques and the
Application[M]. Beijing: National Defense Industry
University. 2001, 40-85
[6] J Christopher, K Philip Chan.Systems For Knowledge
Discovery In Databases IEEE Trans [J] On Knowledge and
Data Engineering 1993 5(6) 903-913
[7] Wang Bingfeng, Liu Lianzhong. Connected Analyzing
and Processing and Its Application in Management of
Information System[J]. Application of Computer Science.
2001 18(1) 71-74,78.
[8] Yan Weimin, Wu Weimin. Data Structure. Beijing: Qing
hua University Press 1996,62-80
[9] Miao Duoqian. Dispersion Methods of Continual
Attributes of Rough Set [J]. Automation Journal. 2001,27(3)
296-302
[10] Yu Jinlong, Li Xiaohong, Sun lixin. Overall Dispersion
of Continual Attributes[J]. Journal of Ha Erbing Industry
University. 2000 32(3) 48-53
[11] Zhao Jun, et al. New Algorithm of Data Dispersion
Based on Rough Set Theory[J]. Journal of Chong Qing
University2002 25(3) 18-21
[12] Han Zhenxiang, et al. A Summary of Rough Set Theory
and Its Application[J]. Control Theory and Application.
1999 16(2) 153-157
[13] Wang Jue, Miao Duoqian. “Data Contraction” Based on
Rough Set Theory[J]. Computer Journal. 1998, 21(5) 393
400
[14] Miao Duoqian, Fan Shidong. Knowledge Size
Calculation and Its Application[J]. System Engineering
Theory and Practice. 2002 22(1) 48-56
[15] Wang Mingwen. A Data Filtering Method based on
Information Measurement[J]. Journal of Nan Chang
Hydraulic College. 2002 21(2) 1-6
[16] Zhao Weidong, et al. Data Mining Under Incomplete
Information of Section Attributes [J]. Application of System
Engineering Theory 2001 10(2) 136-147
[17] Liu Yezheng, Yang Shanlin. Studies on Estimation of
Null Based on Rough Set Theory [J]. Computer Engineering.
2001 27(10) 41-42
[18] Miao Duoqian, Hu Guirong. A Enlightening Algorithm
of Knowledge Simplification[J] Computer Study and
Development.1999 36(6) 681-684
[19] Mohua Banerjee Sushmita Sankar K.Rough fuzzy
MLP :Knowledge encoding and classification[J] IEEE
Transactions on neural networks 1998 9(6) 1203-1206
[20] J.R.Quinlan.Induction of Decision Treees Machine
learning 1986(1) 81-106
[21] Wang Jun. Studies on the Discovery of Database
Knowledge[D]. Beijing: Institute of Calculation of China
Academy of Sciences. 1997
[22] Chen Wenwei. Decision Supporting System and Its
Development[M]. Beijing: Qing Hua University Press.
1994,90-97
Proceedings of the 2005 IEEE International Conference on e-Business Engineering (ICEBE’05) 0-7695-2430-3/05 $20.00 © 2005 IEEE