by, girija patil priyanka patkar aanum shaikh aditi …webpage.pace.edu/at64915n/report.pdf ·...
TRANSCRIPT
A SYNOPSIS
ON
2014-2015
WEB USAGE MINING
BY,
GIRIJA PATIL
PRIYANKA PATKAR
AANUM SHAIKH
ADITI THAKKAR
Web usage mining in Business Intelligence
2
A SYNOPSIS
ON
Web Usage Mining
BY
Girija Patil
Priyanka Patkar
Aanum Shaikh
Aditi Thakkar
Under the guidance of
Internal Guide
Prof. Sumitra Sadhukhan
Juhu-Versova Link Road Versova, Andheri(W), Mumbai-53
University of Mumbai
2014– 2015
Web usage mining in Business Intelligence
3
Juhu-Versova Link Road Versova, Andheri(W), Mumbai-53
This is to certify that
1. Girija Patil - B-724
2. Priyanka Patkar - B-726
3. Aanum Shaikh - B-743
4. Aditi Thakkar – B-759
Have satisfactorily completed this synopsis entitled
Web Usage Mining
Towards the partial fulfillment of the
BACHELOR OF ENGINEERING
IN
(COMPUTER ENGINEERING)
as laid by University of Mumbai.
Guide H.O.D.
Prof.S.Sadhukhan Prof. S. B. Wankhade
Principal
Dr.Udhav Bhosle
Internal Examiner External Examiner
Web usage mining in Business Intelligence
4
ACKNOWLEDGEMENT
We wish to express our sincere gratitude to Dr. U. V. Bhosle, Principal
and Prof. S. B. Wankhade, H.O.D of Computer Department of RGIT for
providing us an opportunity to do our Seminar work on “Web Usage Mining ".
This Seminar bears on imprint of many people. We sincerely thank our
Seminar guide Mrs. Sumitra Sadhukhan for her guidance and encouragement in
successful completion of our Seminar work.
We would also like to thank our staff members for their help in carrying
out this Seminar work.
Finally, we would like to thank our colleagues and friends who helped
us in completing the Seminar successfully.
1. Girija Patil
2. Priyanka
Patkar
3. Aanum Shaikh
4. Aditi Thakkar
Web usage mining in Business Intelligence
5
Abstract
Web Usage Mining is the application of data mining techniques to discover
interesting usage patterns from Web data, in order to understand and better serve
the needs of Web-based applications. Usage data captures the identity or origin of
Web users along with their browsing behavior at a Web site. Web server data
corresponds to the user logs that are collected at Web server. Some of the typical
data collected at a Web server include IP addresses, page references, and access
time of the users and is the main input to the present Research. Our main aim is to
concentrate on web usage mining and in particular focus on discovering the web
usage patterns of websites from the server log files.
Web mining can provide companies managerial insight into visitor profiles,
which help top management take strategic actions accordingly. The proposed
work is an efficient algorithm for generating frequent access patterns from the
access paths of the users. This algorithm is optimized to takes less time compared
to the existing algorithms. The main aim of this algorithm is to reduce execution
time and memory utilization as compared to the existing algorithm viz. Apriori
algorithm. The frequent access patterns show the sequence of web pages which
are frequently navigated by the user. The proposed algorithm i.e. a combination of
Apriori and FP Growth Algorithm, searches for large item-sets during its initial
database pass and uses its result as the seed for discovering other large datasets
during subsequent passes. Thus, frequently accessed products can be discovered
efficiently using the combination algorithm which plays a vital role in Business
Intelligence (BI).
Web usage mining in Business Intelligence
6
Table of Contents
Chapter
No.
Topic Page No.
1 Introduction
1.1 Web Usage Mining Process
1.1.1 Applications
1.1.1.1.1.1.1.1.1. 1
1.1 Sub section 1
1.2 Sub section 2
8
9
10
2 Review Of Literature
2.1 Apriori Algorithm
2.2 FP-Growth Algorithm
11
11
13
3 Existing System
3.1 Input System
3.1.1 Web Log Files
3.1.2 Output System
16
16
16
17
4 Proposed System 18
5 Design Details
5.1 Software Development Life Cycle(SDLC)
5.2 Steps in SDLC
5.3 Waterfall Model
5.4 DFD
19
19
20
22
24
6 Implementation Plan 26
7 Analysis
7.1 Detail Of Hardware And Software
7.2Backend
28
29
29
8 Conclusion 31
References 32
Web usage mining in Business Intelligence
7
List of Figures
Figure No. Figure Name Page No. 1.1.1 Web Usage Mining process
9
1.1.1.1 Applications of Web Usage Mining
10
2.1.1 Apriori algorithm flowchart.
13
3.1.1.1 Web log files
.
16
3.1.2.1 Data extracted from web log files
17
5.1.1 SDLC
19
5.1.2 Gantt Chart For SDLC 19
5.3.1 Waterfall Model 22
5.4.1 User DFD 24
5.4.2 User Usecase 24
5.4.3 User Flowchart
Web usage mining in Business Intelligence
8
CHAPTER 1
INTRODUCTION
The Web is a huge, explosive, diverse, dynamic and mostly unstructured
data repository, which supplies incredible amount of information, and also raises
the complexity of how to deal with the information from the different perspectives
of view, users, web service providers, business analysts. Web Usage Mining is the
application of data mining techniques to discover interesting usage patterns from
Web data, in order to understand and better serve the needs of Web-based
applications. Usage data captures the identity or origin of Web users along with
their browsing behavior at a Web site. Web usage mining itself can be classified
further depending on the kind of usage data considered. They are web server data,
application server data and application level data.
Web server data corresponds to the user logs that are collected at Web
server. Web usage mining refers to the automatic discovery and analysis of
patterns in click stream and associated data collected or generated as a result of
user interactions with web resources on one or more web sites. It consists of three
phases which are data Pre-processing, pattern discovery and pattern analysis.
These are explained in depth in section 1.1.
Web usage mining in Business Intelligence
9
1.1 Web Usage Mining Process
fig 1.1.1 Web Usage Mining process
PRE-PROCESSING:
Pre-processing include the fusion and synchronization of data from multiple log
files, data cleaning, page view identification, user identification, session
identification (or sessionization), episode identification, and the integration of
click stream data with other data sources such as content or semantic information.
PATTERN DISCOVERY:
In the pattern discovery phase, frequent pattern discovery algorithms are applied
on raw data. Web site designers should have clear understanding of user’s profile
and site objectives as well as an emphasized knowledge of the way users will
browse web pages.
Web usage mining in Business Intelligence
10
PATTERN ANALYSIS:
In the pattern analysis phase interesting knowledge is extracted from frequent
patterns and these results are used for website modification. The web usage
pattern analysis is the process of identifying browsing patterns by analyzing the
users navigational behaviour. The web server log files which store the information
about the visitors of the websites is used as input for the web usage pattern
analysis process. First these log files are pre-processed and converted into
required formats so web usage mining techniques can apply on these web logs.
1.1.1 APPLICATIONS
The figure shows Web Usage Mining applications which can be implemented
using various techniques like sequence mining, Clustering, Classification, etc.
Our focus is to implement Web Usage mining with the help of Association rules
using algorithms like FP-growth, Apriori, improvised FP tree, etc.
Fig 1.1.1.1: Applications of Web Usage Mining
Web usage mining in Business Intelligence
11
CHAPTER 2
REVIEW OF LITERATURE
The Web Mining is the application for data mining techniques to automatically
discover and extract information from the web. Web usage mining has various
application areas such as web pre-fetching, site reorganization and web
personalization. Most important of web usage mining is discovering useful
patterns form web log data by using pattern discovery technique such as Apriori
,FP-Growth algorithm. Apriori algorithm for weblog mining is a well known
technique .Many algorithms are already existing for generating frequent access
patterns from the access paths Eg. Apriori Algorithm, FP-Tree Algorithm, etc.
But these Algorithms will take more database scans for generating user access
patterns. These algorithms will take more time and more memory. It adds the
property of the user ID during every step of producing the candidate set and every
step of scanning the database to decide about whether an item in the candidate set
should be used to produce next candidate set. The algorithm reduces the size of
candidate set in order to reduce the number of database scanning.
2.1 Apriori Algorithm
It searches for large item-sets during its initial database pass and uses its result as
the seed for discovering other large datasets during subsequent passes. Rules
having a support level above the minimum are called large or frequent item-sets
and those below are called small item-sets. The algorithm is based on the large
item-set property which states: Any subset of a large item-set is large and any
subset of frequent item set must be frequent. Since the Algorithm uses prior
knowledge of frequent item set it has been given the name Apriori. It is an
Web usage mining in Business Intelligence
12
iterative level wise search Algorithm, where k item-sets are used to explore (k+1)-
item-sets. The system operates in the following three modules.
Preprocessing module.
Apriori or FP Growth Algorithm Module.
Association Rule Generation.
Results.
The pre-processing module converts the log file, which normally is in ASCII
format, into a database like format, which can be processed by the Apriori
algorithm. Apriori implements level-wise search using frequent item property and
can be additionally optimized. Apriori is the simplest algorithm which is used for
mining of frequent patterns from the transactional database.
Advantages:
Uses large item set properly.
Easily parallelized.
Easy to Implement.
Disadvantages:
It is costly to handle a huge number of candidate sets.
It is tedious to repeatedly scan the database and check a large set of candidates by
pattern matching, which is especially true for mining long patterns.
The Apriori algorithm is given below:
Lk: Set of frequent item sets of size k (with min support)
Ck: Set of candidate item set of size k (potentially
frequent item sets)
L1 = {frequent items};
for (k = 1; Lk !=Æ; k++) do
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in
Web usage mining in Business Intelligence
13
Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
return Èk Lk
Following is the flowchart for apriori algorithm:
Fig: 2.1.1 Apriori algorithm flowchart.
2.2 FP-Growth Algorithm:
FP tree is a compact data structure that stores important and quantitative
information about frequent patterns. The main components of FP tree are:
It consists of one root labelled as “root”, a set of item prefix sub-trees as the
children of the root, and a frequent-item header table. Each node in the item prefix
sub-tree consists of three fields: item-name, count, and node-link, where item-
name registers which item this node represents, count registers the number of
transactions represented by the portion of the path reaching this node, and node-
Web usage mining in Business Intelligence
14
link links to the next node in the FP tree carrying the same item-name, or null if
there is none.
Each entry in the frequent-item header table consists of two fields,
item-name and head of node link, which points to the first node in the FP-tree
carrying the item-name. Second, an FP-tree-based pattern-fragment growth
mining method is developed, which starts from a frequent length-1 pattern (as an
initial suffix pattern), examines only its conditional-pattern base (a “sub-database”
which consists of the set of frequent items co-occurring with the suffix pattern),
constructs its (conditional) FP-tree, and performs mining recursively with such a
tree. The pattern growth is achieved via concatenation of the suffix pattern with
the new ones generated from a conditional FP-tree. Since the frequent item set in
any transaction is always encoded in the corresponding path of the frequent-
pattern trees, pattern growth ensures the completeness of the result. FP-growth, is
used for efficient mining of frequent patterns in large databases.
Algorithm of FP-Growth:
Input: A database DB, represented by FP-tree constructed and a minimum
support threshold .
Output: The complete set of frequent patterns.
Method: call FP-growth(FP-tree, null).
Procedure FP-growth(Tree, a) {
1) if Tree contains a single prefix path then // Mining single prefix-path FP-tree {
2) let P be the single prefix-path part of Tree;
3 let Q be the multipath part with the top branching node replaced by a null root;
4) for each combination (denoted as ß) of the nodes in the path P do
5) generate pattern ß ∪ a with support = minimum support of nodes in ß;
6 )let freq pattern set(P) be the set of patterns so generated;}
7) else let Q be Tree;
Web usage mining in Business Intelligence
15
8) for each item ai in Q do { // Mining multipath FP-tree
9) generate pattern ß = ai ∪ a with support = ai .support;
10) construct ß’s conditional pattern-base and then ß’s conditional FP-tree Tree ß;
11) if Tree ß ≠ Ø then
12)call FP-growth(Tree ß , ß);
13) let freq pattern set(Q) be the set of patterns so generated;}
14) return(freq pattern set(P) ∪ freq pattern set(Q) ∪ (freq pattern set(P) × freq
pattern set(Q)))}
When the FP-tree contains a single prefix-path, the complete set of frequent
patterns can be generated in three parts: the single prefix-path P, the multipath Q,
and their combinations (lines 01 to 03 and 14). The resulting patterns for a single
prefix path are the enumerations of its sub paths that have the minimum support
(lines 04 to 06). Thereafter, the multipath Q is defined (line 03 or 07) and the
resulting patterns from it are processed (lines 08 to 13). Finally, in line 14 the
combined results are returned as the frequent patterns found.
Advantages:
Uses Divide and conquer strategy.
Uses Compact data structure.
Eliminates repeated database scan.
It is faster than other association mining algorithms.
The algorithm reduces the total number of candidate item sets by producing a
compressed version of the database in terms of an FP tree.
Disadvantages:
FP tree may not fit in memory.
FP tree is expensive to build.
Web usage mining in Business Intelligence
16
CHAPTER 3
EXISTING SYSTEM
The existing system uses Apriori algorithm which uses iterative level wise search.
It is an algorithm for frequent item set mining and association rule learning over
transactional databases. It proceeds by identifying the frequent individual items in
the database and extending them to larger and larger item sets as long as those
item sets appear sufficiently often in the database. This increases the execution
time.
3.1 Input System
3.1.1 Web Log Files
A log file is a file in which every page request made to the web server is recorded.
IP address of the computer making the request.
User ID, (this field is not used in most cases).
Date and time of the request.
Size of the file transferred.
Referring URL, that is, the URL of the page which contains the link that
generated the request.
Name and version of the browser being used.
Fig 3.1.1.1: Web log files
Web usage mining in Business Intelligence
17
3.1.2 Output System
Web log files can be used to reconstruct the user navigation sessions
within the site from which the log data originates.
The output system mainly focuses on generation of reports.
These reports act as :
Source of information required (Personalization).
Permanent hard copy of the results.
Fig 3.1.2.1: Data extracted from web log files
Web usage mining in Business Intelligence
18
CHAPTER 4
PROPOSED SYSTEM
The major drawbacks of the existing system are high execution time and excess
memory usage. In FP growth algorithm, it takes more time for recursive calls and
is good only when user access paths are common. Also it consumes more
memory. Thus we propose a combination of FP Growth and Apriori algorithm to
make the most of the all the advantages of both these algorithms and efficiently
overcome the drawbacks of existing system.
Modules:
1. Manage Users:- In this module admin manages the users. Which user is regular
and which is not.
2. Manage Web log File:- In this module admin manages the usage of the users
like which user visits which links and pages.
3. Data Preprocessing:- In this module system will remove unwanted data like
less visited links and pages.
4. Pattern discovery (Apriori Algorithm):- In this module, the system applies
Apriori algorithm on the web log file.
5. Pattern Analysis:- In this module system predict that the user is interested in
which domain of interest.
6. Result:- This module provides the links which will satisfy users
requirements(Which will very useful to the users).
Project Significance
Generally, this project will produce the useful finding for analyzing the Web
usage pattern for ELearning:
Web usage mining in Business Intelligence
19
This study will become the first step for the analyzing E-Learning portal by
applying Web usage mining approach with basic Association Rules –Apriori
algorithm and FP growth Algorithm.
i. The outcomes from this study can be used by the Web administrator in
order to plan necessary improvement, enhancement and valuable actions
to the E-Learning portal.
ii. The implementation of Web usage mining process for E-Learning portal
may becomes the guide line for the system development purposes.
Web usage mining in Business Intelligence
20
CHAPTER 5
DESIGN DETAILS
5.1 System Development Life Cycle:
The System Development Life Cycle is the process of developing
information systems through investigation, analysis, design, implementation, and
maintenance[7]. The System Development Life Cycle (SDLC) is also known as
Information Systems Development or Application Development.
Fig. 5.1.1 SDLC
Fig. 5.1.2 Gantt Chart for SDLC
Web usage mining in Business Intelligence
21
5.2 Steps involved in the System Development Life Cycle:
Below are the steps involved in the System Development Life Cycle.
Each phase within the overall cycle may be made up of several steps.
Step 1: Software Concept
The first step is to identify a need for the new system. This will include
determining whether a business problem or opportunity exists, conducting a
feasibility study to determine if the proposed solution is cost effective, and
developing a project plan.
This process may involve end users who come up with an idea for improving their
work. Ideally, the process occurs in tandem with a review of the organization's
strategic plan to ensure that IT is being used to help the organization achieve its
strategic objectives. Management may need to approve concept ideas before any
money is budgeted for its development.
Step 2: Requirements Analysis
Requirements analysis is the process of analyzing the information needs of
the end users, the organizational environment, and any system presently being
used, developing the functional requirements of a system that can meet the needs
of the users. Also, the requirements should be recorded in a document, email,
user interface storyboard, executable prototype, or some other form.
The requirements documentation should be referred to throughout the rest of the
system development process to ensure the developing project aligns with user
needs and requirements. Professionals must involve end users in this process to
ensure that the new system will function adequately and meets their needs and
expectations.
Step 3: Architectural Design
Web usage mining in Business Intelligence
22
After the requirements have been determined, the necessary specifications
for the hardware, software, people, and data resources, and the information
products that will satisfy the functional requirements of the proposed system
can be determined. The design will serve as a blueprint for the system and helps
detect problems before these errors or problems are built into the final system.
Professionals create the system design, but must review their work with the users
to ensure the design meets users' needs.
Step 4: Coding and Debugging
Coding and debugging is the act of creating the final system. This step is
done by software developer.
Step 5: System Testing
The system must be tested to evaluate its actual functionality in relation to
expected or intended functionality. Some other issues to consider during this
stage would be converting old data into the new system and training employees to
use the new system. End users will be key in determining whether the developed
system meets the intended requirements, and the extent to which the system is
actually used.
Step 6: Maintenance
Inevitably the system will need maintenance. Software will definitely
undergo change once it is delivered to the customer. There are many reasons for
the change. Change could happen because of some unexpected input values into
the system. In addition, the changes in the system could directly affect the
software operations. The software should be developed to accommodate changes
that could happen during the post implementation period. There is various
software process models like:-
Prototyping Model
RAD Model
Web usage mining in Business Intelligence
23
The Spiral Model
The Waterfall Model
The Iterative Model
5.3 Waterfall model
Software process model deals with the model which we are going to use for the
development of the project. There are many software process models available but while
choosing it we should choose it according to the project size that is whether it is industry
scale project or big scale project or medium scale project.
Accordingly the model which we choose should be suitable for the project as the
software process model changes the cost of the project also changes because the steps in
each software process model varies.This software is build using the waterfall mode. This
model suggests work cascading from step to step like a series of waterfalls. It consists of
the following steps in the following manner.
Fig. 5.3.1 Waterfall model
Analysis Phase: To attack a problem by breaking it into sub-problems. The objective of
analysis is to determine exactly what must be done to solve the problem.
Typically, the system’s logical elements (its boundaries, processes, and data) are
defined during analysis.
Web usage mining in Business Intelligence
24
Design Phase: The objective of design is to determine how the problem will be solved.
During design the analyst’s focus shifts from the logical to the physical. Data
elements are grouped to form physical data structures, screens, reports, files,
and databases.
Coding Phase: The system is created during this phase. Programs are coded, debugged,
documented, and tested. New hardware is selected and ordered. Procedures are
written and tested. End-user documentation is prepared. Databases and files are
initialized. Users are trained.
Testing Phase: Once the system is developed, it is tested to ensure that it does what it was
designed to do. After the system passes its final test and any remaining problems
are corrected, the system is implemented and released to the user. All these phases
are described with respect to the project in the rest of the document.
5.4 Data Flow Diagram
A data flow diagram (DFD) is a graphical representation of the "flow" of data
through an information system, modelling its process aspects. A DFD is often
used as a preliminary step to create an overview of the system, which can later be
elaborated. DFDs can also be used for the visualization of data
processing (structured design). A DFD shows what kind of information will be
input to and output from the system, where the data will come from and go to, and
where the data will be stored. It does not show information about the timing of
processes, or information about whether processes will operate in sequence or in
parallel (which is shown on a flowchart).
Web usage mining in Business Intelligence
25
fig. 5.4.1 User DFD
Fig. 5.4.2 User UserCase
Web usage mining in Business Intelligence
26
Fig. 5.4.3 User Flowchart
Web usage mining in Business Intelligence
27
CHAPTER 6
IMPLEMENTATION PLAN
Phase 1:
Activity Description Effort in
person
weeks
Deliverable
Phase 1
P1-01 Requirement Analysis 2 weeks Requirement Gathering
P1-02 Existing System Study &
Literature
3 weeks Existing System Study &
Literature
P1-03 Technology Selection 2 weeks >NET
P1-04 Modular Specifications 2 weeks Module Description
P1-05 Design & Modeling 4 weeks Analysis Report
Total 13 weeks
Phase2:
Activity Description Effort in
person
weeks
Deliverable
Phase 2
P2-01 Detailed Design 2 weeks LLD / DLD Document
P2-02 UI and user interactions
design
Included in
above
UI document
P2-03 Coding & Implementation 12 weeks Code Release
P2-04 Testing & Bug fixing 2 weeks Test Report
P2-05 Performance Evaluation 4 weeks Analysis Report
P2-06 Release Included in
above
System Release
Total 20 weeks Deployment efforts are extra
Web usage mining in Business Intelligence
28
Gantt Charts
The Gantt Chart shows planned and actual progress for a number of tasks
displayed against a horizontal time scale.
It is effective and easy-to-read method of indicating the actual current status for
each of set of tasks compared to planned progress for each activity of the set.
Gantt Charts provide a clear picture of the current state of the project.
Planned Gantt Chart
Table: Planned Gantt Chart
Web usage mining in Business Intelligence
29
CHAPTER 7
ANALYSIS
FEASIBILITY STUDY
The very first phase in any system developing life cycle is preliminary
investigation. The feasibility study is a major part of this phase. A measure of how
beneficial or practical the development of any information system would be to the
organization is the feasibility study.The feasibility of the development software
can be studied in terms of the following aspects:
Operational Feasibility.
Technical Feasibility.
Economical feasibility.
OPERATIONAL FEASIBILITY
The Application will reduce the time consumed to maintain manual
records and is not tiresome and cumbersome to maintain the records. Hence
operational feasibility is assured.
TECHNICAL FEASIBILITY
Minimum hardware requirements:
1.66 GHz Pentium Processor or Intel compatible processor.
1 GB RAM.
Internet Connectivity.
80 MB hard disk space.
ECONOMICAL FEASIBILTY
Once the hardware and software requirements get fulfilled, there is no
need for the user of our system to spend for any additional overhead. For the user,
the Application will be economically feasible in the following aspects:
The Application will reduce a lot of labour work. Hence the Efforts
will be reduced.
Web usage mining in Business Intelligence
30
Our Application will reduce the time that is wasted in manual
processes.
The storage and handling problems of the registers will be solved.
7.1 DETAILS OF HARDWARE AND SOFTWARE
.NET Framework
The .NET Framework is an environment for building, deploying, and running
XML Web services and other applications. It is the infrastructure for the overall
.NET platform. The .NET Framework consists of three main parts: the common
language runtime, the class libraries, and ASP.NET.
Why C#?
C# is the new language with the power of C++and the slickness of Visual Basic.
It cleans up many of the syntactic peculiarities of C++ without diluting much of
its flavour (thereby enabling C++ developers to transition to it with little
difficulty).And its superiority over VB6 in facilitating powerful OO
implementations is without question.
7.2 BACK-END: Microsoft SQL Server
Business today demands a different kind of data management
solution. Performance, scalability, and reliability are essential, but
businesses now expect more from their key IT investment. SQLServer 2005
exceeds dependability requirements and provides innovative capabilities that
increase employee effectiveness, integrate heterogeneous IT ecosystems, and
maximize capital and operating budgets. SQL Server 2005 provides the
enterprise data management platform your organization needs to adapt
quickly in a fast-changing environment.With the lowest implementation and
maintenance cost in the industry, SQL Server 2005 delivers repaid return on your
Web usage mining in Business Intelligence
31
data management investment. SQL Server 2005 supports the rapid development
of enterprise-class business application that can give your company a critical
competitive advantage. Benchmarked for scalability, speed, and performance,
SQL Server 2005 is a fully enterprise-class database product, providing core
support for Extensible Markup Language (XML) and Internet queries.
Web usage mining in Business Intelligence
32
CHAPTER 8
CONCLUSION
Thus the proposed work is an efficient algorithm for generating frequent access
patterns from the access paths of the users. This algorithm is optimized to take
less time compared to the existing algorithms and store the access paths in the
compressed format. The main aim of this algorithm is to reduce execution time
and memory utilization as compared to the existing algorithms viz. Apriori
algorithm. The frequent access patterns show the sequence of web pages which
are frequently navigated by the user. The proposed Algorithm is not only
generating any candidate sets, but also more number of patterns will be generated,
due to this the number of tree traversals will be more.
Information content on the WWW is increasing at an exponential rate and it is not
surprising to find users having difficulty in navigation and finding relevant
information. Hence, the e-commerce site developers find it difficult to observe
potential customers or web site structure. We are thus making an attempt to
improvise the existing algorithms and bring web mining to a new level.
Web usage mining in Business Intelligence
33
References
[1]B.Santhosh Kumar, K.V.Rukmani, “Implementation of Web Usage Mining
Using Apriori and FP Growth Algorithm”, Int. J. of Advanced Networking and
Applications, Volume: 01, Issue: 06, (2010),
[2]Mishra Rahul, ChoubeyAbha, “Discovery of Frequent Patterns from Web Log
Data by using FP-Growth Algorithm for Web Usage Mining”, International
Journal of Advanced Research in Computer Science and Software Engineering,
Vol.2, pp.311-318,2012.
[3]Han J., Pei J., Yin Y. and Mao R., “Mining frequent patterns without candidate
generation: A frequent-pattern tree approach” Data Mining and Knowledge
Discovery, 2004.
[4]Baglioni M., Ferrara U., Romei A., Ruggieri S., and Turini F., (2003).
Preprocessing and Mining Web Log Data for Web Personalization. In
Proceedings of the 8th Italian Conference on Artificial Intelligence, LNCS Vol.
2829, pp. 237-249.