[ieee 2009 wase international conference on information engineering (icie) - taiyuan, shanxi, china...

4
Application of Data Mining in Electronic Commerce Hewen Tang, Honglin Yan, Zengfang Yang, Yu Ma, Chunping Li School of Information Technology and Engineering of Yuxi Normal University Yuxi, China [email protected], [email protected], [email protected], [email protected],[email protected] Abstract—Data mining has got more and more mature as a field of basic research in computer science and got more and more widely applied in several fields. Electronic Commerce is a good example. This paper surveyed some of approaches where data mining has been applied in electronic commerce. And this paper focused on data mining in the context of e-commerce, not surveying the algorithms in data mining Keywords- data mining, electronic commerce, KDD I. INTRODUCTION A. Electronic commerce Electronic commerce or e-commerce is changing the face of business. It allows better customer management, new strategies for marketing, an expanded range of products, and more efficient operations. A key enabler of this change is the widespread use of increasingly sophisticated data mining tools. Electronic commerce refers to a wide range of online business activities for products and services.[1] It also pertains to any form of business transaction in which the parties interact electronically rather than by physical exchanges or direct physical contact. [2] Another definition of electronic commerce is “any activity that utilizes some form of electronic communication in the inventory, exchange, advertisement, distribution or payment of goods and service”[3]. Electronic commerce is usually associated with buying and selling over the Internet, or conducting any transaction involving the transfer of ownership or rights to use goods or services through a computer-mediated network.[4] Though popular, this definition is not comprehensive enough to capture recent developments in this new and revolutionary business phenomenon. A more complete definition is: E- commerce is the use of electronic communications and digital information processing technology in business transactions to create, transform, and redefine relationships for value creation between or among organizations, and between organizations and individuals.[5] B. Data mining Data mining merges with the phenomena that people are drowning in data but starved for knowledge. It prefers the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner[6]. Data mining have many alias, such as knowledge discovery in database(KDD), knowledge extraction, patter discovery, data/pattern analysis, data archaeology, and data dredging [7]. Data mining has been recognized by many researchers as a key research topic in data handling, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided, and to increase the business opportunities. II. A VIEW OF DATA MINING METHODS Given a truly large amount of data, the challenge in data mining is to discover hidden relationships among various attributes of data and between several snap shots of data over a period of time. These hidden patterns have enormous potential in predictions and personalization in electronic commerce. To uncover these potential, data mining has totally three methods, statistics, artificial intelligence and database, which are usually considered as three key basic supporter-method of data mining. This part gives out brief overview of some features of each of them. A. Statistics Extracting causal information from data is often one of the principal goals of data mining and more generally of statistical inference. Statisticians have done aggregate data analyses on data for decades; thus DM has actually existed from the time large scale statistical modeling has been made possible [8] Statistics provides many basic technologies such as cluster, regression for data mining. Data mining can be regarded as a form of predictive analytics that uses a variety of techniques to explore massive amounts of data to identify relationships between hundreds of data elements - relationships that could not be uncovered through simple 2009 WASE International Conference on Information Engineering 978-0-7695-3679-8/09 $25.00 © 2009 IEEE DOI 10.1109/ICIE.2009.91 631

Upload: chunping

Post on 10-Oct-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2009 WASE International Conference on Information Engineering (ICIE) - Taiyuan, Shanxi, China (2009.07.10-2009.07.11)] 2009 WASE International Conference on Information Engineering

Application of Data Mining in Electronic Commerce

Hewen Tang, Honglin Yan, Zengfang Yang, Yu Ma, Chunping Li School of Information Technology and Engineering of Yuxi Normal University

Yuxi, China

[email protected], [email protected], [email protected], [email protected],[email protected]

Abstract—Data mining has got more and more mature as a field of basic research in computer science and got more and more widely applied in several fields. Electronic Commerce is a good example. This paper surveyed some of approaches where data mining has been applied in electronic commerce. And this paper focused on data mining in the context of e-commerce, not surveying the algorithms in data mining

Keywords- data mining, electronic commerce, KDD

I. INTRODUCTION

A. Electronic commerce Electronic commerce or e-commerce is changing the face

of business. It allows better customer management, new strategies for marketing, an expanded range of products, and more efficient operations. A key enabler of this change is the widespread use of increasingly sophisticated data mining tools.

Electronic commerce refers to a wide range of online business activities for products and services.[1] It also pertains to “any form of business transaction in which the parties interact electronically rather than by physical exchanges or direct physical contact. ” [2] Another definition of electronic commerce is “any activity that utilizes some form of electronic communication in the inventory, exchange, advertisement, distribution or payment of goods and service”[3].

Electronic commerce is usually associated with buying and selling over the Internet, or conducting any transaction involving the transfer of ownership or rights to use goods or services through a computer-mediated network.[4] Though popular, this definition is not comprehensive enough to capture recent developments in this new and revolutionary business phenomenon. A more complete definition is: E-commerce is the use of electronic communications and digital information processing technology in business transactions to create, transform, and redefine relationships for value creation between or among organizations, and between organizations and individuals.[5]

B. Data mining Data mining merges with the phenomena that people are

drowning in data but starved for knowledge. It prefers the

analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner[6]. Data mining have many alias, such as knowledge discovery in database(KDD), knowledge extraction, patter discovery, data/pattern analysis, data archaeology, and data dredging [7].

Data mining has been recognized by many researchers as a key research topic in data handling, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided, and to increase the business opportunities.

II. A VIEW OF DATA MINING METHODS Given a truly large amount of data, the challenge in data

mining is to discover hidden relationships among various attributes of data and between several snap shots of data over a period of time. These hidden patterns have enormous potential in predictions and personalization in electronic commerce. To uncover these potential, data mining has totally three methods, statistics, artificial intelligence and database, which are usually considered as three key basic supporter-method of data mining. This part gives out brief overview of some features of each of them.

A. Statistics Extracting causal information from data is often one of

the principal goals of data mining and more generally of statistical inference. Statisticians have done aggregate data analyses on data for decades; thus DM has actually existed from the time large scale statistical modeling has been made possible [8]

Statistics provides many basic technologies such as cluster, regression for data mining. Data mining can be regarded as a form of predictive analytics that uses a variety of techniques to explore massive amounts of data to identify relationships between hundreds of data elements - relationships that could not be uncovered through simple

2009 WASE International Conference on Information Engineering

978-0-7695-3679-8/09 $25.00 © 2009 IEEE

DOI 10.1109/ICIE.2009.91

631

Page 2: [IEEE 2009 WASE International Conference on Information Engineering (ICIE) - Taiyuan, Shanxi, China (2009.07.10-2009.07.11)] 2009 WASE International Conference on Information Engineering

queries or reports. Data mining methodologies overlap with those in analytical disciplines such as statistics (simulation, principal components, Bayesian methods), forecasting (regression, time-series analysis) and operations research (clustering, neural networks, genetic algorithms).

B. Artificial intelligence(AI) Artificial intelligence was defined as the study and

design of intelligent agents in reference[9], where an intelligent agent is a system that perceives its environment and takes actions which maximize its chances of success.

Artificial intelligence has provided a number of useful methods for data mining by machine learning. Machine learning is the subfield of artificial intelligence that is concerned with the design and development of algorithms that allow computers (machines) to improve their performance over time (to learn) based on data, such as from sensor data or databases. Some of the most popular learning systems include the neural networks and support vector machines. A major focus of machine learning research is to automatically produce (induce) models, such as rules and patterns, from data. Hence, machine learning is closely related to fields such as data mining.

C. atabase Data mining approaches rely heavily on the availability

of high quality data sets, the database community has invented an array of relevant methods and mechanisms that need to be used prior to any data mining exercise. ETL(extract, transform and load) applications are worthy of mention in this context. Given an enterprise system like an ERP (enterprise resource planning) system, it is likely that the number of transactions that happen by the minute could run into hundreds, if not thousands. Data mining can certainly not be run on the transaction databases in their native state. It requires to be extracted at periodic intervals, transformed into a form usable for analysis, and loaded on to the servers and applications that work on the transformed data. Today, software systems exist in the form of data warehousing solutions that are often bundled with the ERP system, to perform this complex and important task.

It is to be observed that data warehouses are essentially snapshots of transactional data aggregated along various dimensions (including time, geographies, demographies, products etc.) In order to run data mining algorithms, it is common practice to use the data available in the data warehouse rather than by running real time scripts to fetch transactional data. This is for the simple reason that for practical purposes, it is sufficient to include snapshots of data taken at weekly or monthly basis, for analysis. Real-time data is not relevant for tactical decision making, which is where data mining is used. Data warehousing is nevertheless fraught with technological challenges. Database researchers have predominantly investigated association rule-mining within the field of data mining. When one has terabytes of data available, the goal of

database engineers in data mining is to create structures and mechanisms to efficiently read in data into memory and run algorithms like A priori [10]. Such algorithms assume the so-called item sets. Consider a database where the transactions pertain to a retail store. Customers buy various products and each transaction records the products bought by the customer. Observe that such databases can grow enormously in size, especially for large retailers who have web storefronts, like amazon.com. Each item set is a record in the database, with attributes mentioning if a particular product was purchased or not. The algorithms compute, given a certain support and confidence, the rules that apply on the given item sets.

III. APPLICATION OF DATA MINING IN ELECTRONIC COMMERCE

Fields that are very specific to DM implementations in e-commerce can be divided into four mainly as customer analyzing, intelligent recommendation, web personalization and buyer behavior analyzing.

A. Customer analyzing Acquiring new customers, delighting and retaining

existing customers, and predicting buyer behavior will improve the availability of products and services and hence the profits. Thus the end goal of any data mining practice in e-commerce is to improve processes that contribute to delivering value to the end customer. Consider an on-line store like http:www.dell.com where the customer can configure a PC of his/her choice, place an order for the same, track its movement, as well as pay for the product and services. With the technology behind such a web site, Dell has the opportunity to make the retail experience exceptional. At the most basic level, the information available in web log files can illuminate what prospective customers are seeking from a site. Are they purposefully shopping or just browsing? Buying something they’re familiar with or something they know little about? Are they shopping from home, from work, or from a hotel dial-up? The information available in log files is often used to determine what profiling can be dynamically processed in the background and indexed into the dynamic generation of HTML, and what performance can be expected from the servers and network to support customer service and make e-business interaction productive[11].

Many famous corporations such as Dell and Alibaba(www.Alibaba.com) provide their customers access to details about all of the systems and configurations they have purchased so they can incorporate the information into their capacity planning and infrastructure integration. Back-end technology systems for the website include sophisticated data mining tools that take care of knowledge representation of customer profiles and predictive modeling of scenarios of customer interactions. For example, once a customer has purchased a certain number of servers, they are likely to need additional routers, switches, load

632

Page 3: [IEEE 2009 WASE International Conference on Information Engineering (ICIE) - Taiyuan, Shanxi, China (2009.07.10-2009.07.11)] 2009 WASE International Conference on Information Engineering

balancers, backup devices etc. Rule-mining based systems could be used to propose such alternatives to the customers.

B. Intelligent recommendation Systems of intelligent recommendation have been

developed to keep the customers automatically informed of important events of interest to them. PENS is a good example which has the ability to notify customers of events, and to predict events and event classes that are likely to be triggered by customers[12]. The event notification system in PENS has the following components: event manager, event channel manager, registries, and proxy manager. The event-prediction system is based on association rule-mining and clustering algorithms. The PENS system is used to actively help an e-commerce service provider to forecast the demand of product categories better. Data mining has also been applied in detecting how customers may respond to promotional offers made by a credit card e-commerce company[13]. Techniques including fuzzy computing and interval computing are used to generate if-then-else rules.

Reference [14] presented a method to build customer profiles in e-commerce settings, based on product hierarchy for more effective personalization. They divide each customer profile into three parts: basic profile learned from customer demographic data; preference profile learned from behavioral data, and rule profile mainly referring to association rules. Based on customer profiles, the authors generate two kinds of recommendations, which are interest recommendation and association recommendation. They also propose a special data structure called profile tree for effective searching and matching.

C. Web personalization Reference [15] presented a comprehensive overview of

the personalization process based on web usage mining. In this context, the author discusses a host of web usage mining activities required for this process, including the preprocessing and integration of data from multiple sources, and common pattern discovery techniques that are applied to the integrated usage data. The goal of this paper is to show how pattern discovery techniques such as clustering, association rule-mining, and sequential pattern discovery, performed on web usage data, can be leveraged effectively as an integrated part of a web personalization system. The author observes that the log data collected automatically by the Web and application servers represent the fine-grained navigational behavior of visitors.

D. buyer behavior analyzing For a successful e-commerce site, reducing user-

perceived latency is the second most important quality after good site-navigation quality. The most successful approach towards reducing user-perceived latency has been the extraction of path traversal patterns from past users access history to predict future user traversal behavior and to get the required resources. However, this approach is suited for

only non-e-commerce sites where there is no purchase behavior. An new approach is to predict user behavior in e-commerce sites[16]. The core of this approach involves extracting knowledge from integrated data of purchase and path traversal patterns of past users (obtainable from web server logs) to predict the purchase and traversal behavior of future users.

Web sites are often used to establish a company’s image, to promote and sell goods and to provide customer support. The success of a web site affects and reflects directly the success of the company in the electronic market. Reference4 [17] proposed a methodology to improve the success of web sites, based on the exploitation of navigation-pattern discovery. In particular it presented a theory, in which success is modeled on the basis of the navigation behavior of the site’s users. It also exploited a web usage miner (WUM), a navigation pattern discovery miner, to study how the success of a site is reflected in the users’ behavior. With WUM it’s easy to measure the success of a website’s components and to obtaine concrete indications of how the site should be improved.

In the context of web mining, clustering could be used to cluster similar click-streams to determine learning behaviors in the case of electronic learning, or general site access behaviors in e-commerce. Most of the algorithms presented in the literature to deal with clustering web sessions treat sessions as sets of visited pages within a time period and do not consider the sequence of the click-stream visitation. This has a significant consequence when comparing similarities between web sessions. Reference[18] proposed an algorithm based on sequence alignment to measure similarities between web sessions where sessions are chronologically ordered sequences of page accesses.

IV. CONCLUSION Electronic commerce is the certainly result of

development of information technique and the future model of commerce. It owns rich information resource .This can offer strongly data support to the application of data mining. At the same time, data mining also provides strongly efficient technique for electronic commerce. In all, data mining is a very important and essential technique of electronic commerce. The application of data mining in electronic commerce will get more and more wide and have a more brighter futrue.

REFERENCES

[1] Anita Rosen, The E-commerce Question and Answer Book (USA: American Management Association, 2000), 5.

[2] MK, Euro Info Correspondence Centre (Belgrade, Serbia), “E-commerce-Factor of Economic Growth;” available from the following website: http://www.eicc.co.yu/newspro/viewnews.cgi?newsstart3end5; Internet; accessed 15 January 2009.

633

Page 4: [IEEE 2009 WASE International Conference on Information Engineering (ICIE) - Taiyuan, Shanxi, China (2009.07.10-2009.07.11)] 2009 WASE International Conference on Information Engineering

[3] C.A.Charles, C.P.Foss, S.Dewan(Eds.): Glogalization Electronic Commerce, Report on the International Forum on Electronic Commerce, Beijing, China, 20-21 March 1996, Center for Strategic & International Studies

[4] Thomas L. Mesenbourg, Measuring Electronic Business: Definitions, Underlying Concepts, and Measurement Plans.

[5] Definition adapted and expanded from Emmanuel Lallana, Rudy Quimbo, Zorayda Ruth Andam, ePrimer: An Introduction to eCommerce (Philippines: DAI-AGILE, 2000), 2.

[6] H.David, M.Heikki and S.Smyth. Principles of Data Mining. The MIT Press, 2001.

[7] H.Jiawei, K.Micheline. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, Inc, 2001.

[8] Carbone P L 2000 Expanding the meaning of and applications for data mining. In IEEE Int. Conf. on Systems, Man, and Cybernetics (New York: IEEE) pp 1872–1873

[9] Russell, Stuart J.; Norvig, Peter (2003), Artificial Intelligence: A Modern Approach (2nd ed.), Upper Saddle River, NJ: Prentice Hall, ISBN 0-13-790395-2, http://aima.cs.berkeley.edu/

[10] Agrawal R, Srikant R 1994 Fast algorithms for mining association rules. In 20th Int. Conf. on Very Large Databases (New York: Morgan Kaufmann) p 487–499

[11] Auguste D M 2001 Customer service in e-business. IEEE Internet Comput. 5(5): 90–91

[12] Jeng J J, Drissi Y 2000 Pens: a predictive event notification system for e-commerce environment. In The 24th Annu. Int. Computer Software and Applications Conference, COMPSAC 2000, pp 93–98

[13] Zhang Y Q, Shteynberg M, Prasad S K, Sunderraman R 2003 Granular fuzzy web intelligence techniques for profitable data mining. In 12th IEEE Int. Conf. on Fuzzy Systems, FUZZ ’03 (New York: IEEE Comput. Soc.) pp 1462–1464

[14] Niu L, Yan XW, Zhang C Q, Zhang S C 2002 Product hierarchy-based customer profiles for electronic commerce recommendation. In Int. Conf. on Machine Learning and Cybernetics pp 1075–1080

[15] MobasherB2004Web usage mining and personalization. In Practical handbook of internet computing (ed.) M P Singh (CRC Press)

[16] Vallamkondu S, Gruenwald L 2003 Integrating purchase patterns and traversal patterns to predict http requests in e-commerce sites. In IEEE Int. Conf. on e-commerce, pp 256–263

[17] Spiliopoulou M, Pohle C 2000 Data mining to measure and improve the success of web sites. J. Data Mining and Knowledge Discovery.

[18] WangW, Zaiane O R 2002 Clustering web sessions by sequence alignment. In 13th Int. Workshop on Database and Expert Systems Applications pp 394–398

634