Chaoyang University of Technology
Clustering web transactions using rough approximation
Source : Fuzzy Sets and Systems 148 (2004) 131–138
Author : Supriya Kumar Dea, P. Radha Krishnab.
Adviser : RC. Chen
Present : Yu-Hsiang Fu (傅昱翔 )
Date :2006/12/14
Chaoyang University of TechnologyChaoyang University of Technology
2006/12/14 2
Chaoyang University of Technology Outline
• Abstract• Introduction• Rough Set• Rough Set Approximation• Experimental Results• Conclusions• References
2006/12/14 3
Chaoyang University of Technology Abstract
• Web usage mining is the application of data mining techniques
• Discovering user access patterns from web access log
• Using rough sets can effectively mine web log records to discover web page access patterns
2006/12/14 4
Chaoyang University of Technology Introduction (1/2)
• WWW includes a huge number of hyperlinks ,access and usage information.
• Web Mining– Web content mining– Web structure mining– Web usage mining
2006/12/14 5
Chaoyang University of Technology Introduction (2/2)
• User’s behaviors– Click stream is the sequence of clicks or pages
requested as a visitor explores a Web site.• Web transaction
– A user session is the click-stream of page views for a single user across the entire web.
• The usage patterns are different for different users that navigates the same pattern in different ways.
2006/12/14 6
Chaoyang University of Technology Rough Set (1/5)
• The Rough Set theory was introduced by Zdzislaw Pawlak in the early 1980s.
• Rough Set deals with the classification analysis of data table.
• Rough Set develop efficient searching for relevant tolerance relations and extract interesting patterns in data.
2006/12/14 7
Chaoyang University of Technology Rough Set (2/5)
• Universe and Relation
2006/12/14 8
Chaoyang University of Technology Rough Set (3/5)
• Lower and Upper Approximation
( surely )
( possible )
2006/12/14 9
Chaoyang University of Technology Rough Set (4/5)
• Boundary and Negative region
2006/12/14 10
Chaoyang University of Technology Rough Set (5/5)
2006/12/14 11
Chaoyang University of TechnologyRough Set Approximation (1/7)
• A user transaction is a sequence of items
• Let there be m users and the user transactions be
• Let U be the set of distinct n clicks (hyperlinks/URLs) clicked by users
2006/12/14 12
Chaoyang University of TechnologyRough Set Approximation (2/7)
2006/12/14 13
Chaoyang University of TechnologyRough Set Approximation (3/7)
2006/12/14 14
Chaoyang University of TechnologyRough Set Approximation (4/7)
2006/12/14 15
Chaoyang University of TechnologyRough Set Approximation (5/7)
2006/12/14 16
Chaoyang University of TechnologyRough Set Approximation (6/7)
2006/12/14 17
Chaoyang University of TechnologyRough Set Approximation (7/7)
2006/12/14 18
Chaoyang University of TechnologyExperimental Results (1/2)
• Log files form www.idrbt.ac.in .– The web sites consists of 62 web pages and 283
links.– Log files record every click that user make.– Session time is 30 min.
2006/12/14 19
Chaoyang University of TechnologyExperimental Results (2/2)
• Steps:– First, the data is preprocessed and transformed.– Second, computing similarity upper approximation for
each transaction.– Finally, clusters of transactions using rough approxim
ation (threshold = 0.5).
2006/12/14 20
Chaoyang University of Technology Conclusion• This paper presented a novel algorithm for
clustering using rough approximation to cluster the web transactions of user access.
• This approach is useful to find interesting user access patterns in web log.
• The result can be helpful for building up adaptive web according to the user’s behavior.
2006/12/14 21
Chaoyang University of Technology References• Zdzislaw Pawlak,Jerzy Grzymala-Busse,Roman Slowinski, and Wojciech Ziarko, Rough S
ets, COMMUNICATIONS OF THE ACM November 1995/Vol. 38, No. 11, 88-95• Zdzislaw Pawlak, Rough Sets (Abstract) ,262-264• Zdzisław Pawlak , Andrzej Skowron , Rudiments of rough sets , Information Sciences 177
(2007) 3–27• Nils Kammenhuber, Julia Luxenburger, Anja Feldmann, Gerhard Weikum, Web Search Cli
ckstreams, IMC’06, October 25–27, 2006,• A, Jain, Data Clustering: A Review , ACM Computing Suversy, Vol 31, No 3, September
1999 ,274-275,281-285