by, girija patil priyanka patkar aanum shaikh aditi …webpage.pace.edu/at64915n/report.pdf ·...

33
A SYNOPSIS ON 2014-2015 WEB USAGE MINING BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI THAKKAR

Upload: others

Post on 05-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

A SYNOPSIS

ON

2014-2015

WEB USAGE MINING

BY,

GIRIJA PATIL

PRIYANKA PATKAR

AANUM SHAIKH

ADITI THAKKAR

Page 2: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

2

A SYNOPSIS

ON

Web Usage Mining

BY

Girija Patil

Priyanka Patkar

Aanum Shaikh

Aditi Thakkar

Under the guidance of

Internal Guide

Prof. Sumitra Sadhukhan

Juhu-Versova Link Road Versova, Andheri(W), Mumbai-53

University of Mumbai

2014– 2015

Page 3: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

3

Juhu-Versova Link Road Versova, Andheri(W), Mumbai-53

This is to certify that

1. Girija Patil - B-724

2. Priyanka Patkar - B-726

3. Aanum Shaikh - B-743

4. Aditi Thakkar – B-759

Have satisfactorily completed this synopsis entitled

Web Usage Mining

Towards the partial fulfillment of the

BACHELOR OF ENGINEERING

IN

(COMPUTER ENGINEERING)

as laid by University of Mumbai.

Guide H.O.D.

Prof.S.Sadhukhan Prof. S. B. Wankhade

Principal

Dr.Udhav Bhosle

Internal Examiner External Examiner

Page 4: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

4

ACKNOWLEDGEMENT

We wish to express our sincere gratitude to Dr. U. V. Bhosle, Principal

and Prof. S. B. Wankhade, H.O.D of Computer Department of RGIT for

providing us an opportunity to do our Seminar work on “Web Usage Mining ".

This Seminar bears on imprint of many people. We sincerely thank our

Seminar guide Mrs. Sumitra Sadhukhan for her guidance and encouragement in

successful completion of our Seminar work.

We would also like to thank our staff members for their help in carrying

out this Seminar work.

Finally, we would like to thank our colleagues and friends who helped

us in completing the Seminar successfully.

1. Girija Patil

2. Priyanka

Patkar

3. Aanum Shaikh

4. Aditi Thakkar

Page 5: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

5

Abstract

Web Usage Mining is the application of data mining techniques to discover

interesting usage patterns from Web data, in order to understand and better serve

the needs of Web-based applications. Usage data captures the identity or origin of

Web users along with their browsing behavior at a Web site. Web server data

corresponds to the user logs that are collected at Web server. Some of the typical

data collected at a Web server include IP addresses, page references, and access

time of the users and is the main input to the present Research. Our main aim is to

concentrate on web usage mining and in particular focus on discovering the web

usage patterns of websites from the server log files.

Web mining can provide companies managerial insight into visitor profiles,

which help top management take strategic actions accordingly. The proposed

work is an efficient algorithm for generating frequent access patterns from the

access paths of the users. This algorithm is optimized to takes less time compared

to the existing algorithms. The main aim of this algorithm is to reduce execution

time and memory utilization as compared to the existing algorithm viz. Apriori

algorithm. The frequent access patterns show the sequence of web pages which

are frequently navigated by the user. The proposed algorithm i.e. a combination of

Apriori and FP Growth Algorithm, searches for large item-sets during its initial

database pass and uses its result as the seed for discovering other large datasets

during subsequent passes. Thus, frequently accessed products can be discovered

efficiently using the combination algorithm which plays a vital role in Business

Intelligence (BI).

Page 6: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

6

Table of Contents

Chapter

No.

Topic Page No.

1 Introduction

1.1 Web Usage Mining Process

1.1.1 Applications

1.1.1.1.1.1.1.1.1. 1

1.1 Sub section 1

1.2 Sub section 2

8

9

10

2 Review Of Literature

2.1 Apriori Algorithm

2.2 FP-Growth Algorithm

11

11

13

3 Existing System

3.1 Input System

3.1.1 Web Log Files

3.1.2 Output System

16

16

16

17

4 Proposed System 18

5 Design Details

5.1 Software Development Life Cycle(SDLC)

5.2 Steps in SDLC

5.3 Waterfall Model

5.4 DFD

19

19

20

22

24

6 Implementation Plan 26

7 Analysis

7.1 Detail Of Hardware And Software

7.2Backend

28

29

29

8 Conclusion 31

References 32

Page 7: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

7

List of Figures

Figure No. Figure Name Page No. 1.1.1 Web Usage Mining process

9

1.1.1.1 Applications of Web Usage Mining

10

2.1.1 Apriori algorithm flowchart.

13

3.1.1.1 Web log files

.

16

3.1.2.1 Data extracted from web log files

17

5.1.1 SDLC

19

5.1.2 Gantt Chart For SDLC 19

5.3.1 Waterfall Model 22

5.4.1 User DFD 24

5.4.2 User Usecase 24

5.4.3 User Flowchart

Page 8: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

8

CHAPTER 1

INTRODUCTION

The Web is a huge, explosive, diverse, dynamic and mostly unstructured

data repository, which supplies incredible amount of information, and also raises

the complexity of how to deal with the information from the different perspectives

of view, users, web service providers, business analysts. Web Usage Mining is the

application of data mining techniques to discover interesting usage patterns from

Web data, in order to understand and better serve the needs of Web-based

applications. Usage data captures the identity or origin of Web users along with

their browsing behavior at a Web site. Web usage mining itself can be classified

further depending on the kind of usage data considered. They are web server data,

application server data and application level data.

Web server data corresponds to the user logs that are collected at Web

server. Web usage mining refers to the automatic discovery and analysis of

patterns in click stream and associated data collected or generated as a result of

user interactions with web resources on one or more web sites. It consists of three

phases which are data Pre-processing, pattern discovery and pattern analysis.

These are explained in depth in section 1.1.

Page 9: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

9

1.1 Web Usage Mining Process

fig 1.1.1 Web Usage Mining process

PRE-PROCESSING:

Pre-processing include the fusion and synchronization of data from multiple log

files, data cleaning, page view identification, user identification, session

identification (or sessionization), episode identification, and the integration of

click stream data with other data sources such as content or semantic information.

PATTERN DISCOVERY:

In the pattern discovery phase, frequent pattern discovery algorithms are applied

on raw data. Web site designers should have clear understanding of user’s profile

and site objectives as well as an emphasized knowledge of the way users will

browse web pages.

Page 10: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

10

PATTERN ANALYSIS:

In the pattern analysis phase interesting knowledge is extracted from frequent

patterns and these results are used for website modification. The web usage

pattern analysis is the process of identifying browsing patterns by analyzing the

users navigational behaviour. The web server log files which store the information

about the visitors of the websites is used as input for the web usage pattern

analysis process. First these log files are pre-processed and converted into

required formats so web usage mining techniques can apply on these web logs.

1.1.1 APPLICATIONS

The figure shows Web Usage Mining applications which can be implemented

using various techniques like sequence mining, Clustering, Classification, etc.

Our focus is to implement Web Usage mining with the help of Association rules

using algorithms like FP-growth, Apriori, improvised FP tree, etc.

Fig 1.1.1.1: Applications of Web Usage Mining

Page 11: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

11

CHAPTER 2

REVIEW OF LITERATURE

The Web Mining is the application for data mining techniques to automatically

discover and extract information from the web. Web usage mining has various

application areas such as web pre-fetching, site reorganization and web

personalization. Most important of web usage mining is discovering useful

patterns form web log data by using pattern discovery technique such as Apriori

,FP-Growth algorithm. Apriori algorithm for weblog mining is a well known

technique .Many algorithms are already existing for generating frequent access

patterns from the access paths Eg. Apriori Algorithm, FP-Tree Algorithm, etc.

But these Algorithms will take more database scans for generating user access

patterns. These algorithms will take more time and more memory. It adds the

property of the user ID during every step of producing the candidate set and every

step of scanning the database to decide about whether an item in the candidate set

should be used to produce next candidate set. The algorithm reduces the size of

candidate set in order to reduce the number of database scanning.

2.1 Apriori Algorithm

It searches for large item-sets during its initial database pass and uses its result as

the seed for discovering other large datasets during subsequent passes. Rules

having a support level above the minimum are called large or frequent item-sets

and those below are called small item-sets. The algorithm is based on the large

item-set property which states: Any subset of a large item-set is large and any

subset of frequent item set must be frequent. Since the Algorithm uses prior

knowledge of frequent item set it has been given the name Apriori. It is an

Page 12: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

12

iterative level wise search Algorithm, where k item-sets are used to explore (k+1)-

item-sets. The system operates in the following three modules.

Preprocessing module.

Apriori or FP Growth Algorithm Module.

Association Rule Generation.

Results.

The pre-processing module converts the log file, which normally is in ASCII

format, into a database like format, which can be processed by the Apriori

algorithm. Apriori implements level-wise search using frequent item property and

can be additionally optimized. Apriori is the simplest algorithm which is used for

mining of frequent patterns from the transactional database.

Advantages:

Uses large item set properly.

Easily parallelized.

Easy to Implement.

Disadvantages:

It is costly to handle a huge number of candidate sets.

It is tedious to repeatedly scan the database and check a large set of candidates by

pattern matching, which is especially true for mining long patterns.

The Apriori algorithm is given below:

Lk: Set of frequent item sets of size k (with min support)

Ck: Set of candidate item set of size k (potentially

frequent item sets)

L1 = {frequent items};

for (k = 1; Lk !=Æ; k++) do

Ck+1 = candidates generated from Lk;

for each transaction t in database do

increment the count of all candidates in

Page 13: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

13

Ck+1 that are contained in t

Lk+1 = candidates in Ck+1 with min_support

return Èk Lk

Following is the flowchart for apriori algorithm:

Fig: 2.1.1 Apriori algorithm flowchart.

2.2 FP-Growth Algorithm:

FP tree is a compact data structure that stores important and quantitative

information about frequent patterns. The main components of FP tree are:

It consists of one root labelled as “root”, a set of item prefix sub-trees as the

children of the root, and a frequent-item header table. Each node in the item prefix

sub-tree consists of three fields: item-name, count, and node-link, where item-

name registers which item this node represents, count registers the number of

transactions represented by the portion of the path reaching this node, and node-

Page 14: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

14

link links to the next node in the FP tree carrying the same item-name, or null if

there is none.

Each entry in the frequent-item header table consists of two fields,

item-name and head of node link, which points to the first node in the FP-tree

carrying the item-name. Second, an FP-tree-based pattern-fragment growth

mining method is developed, which starts from a frequent length-1 pattern (as an

initial suffix pattern), examines only its conditional-pattern base (a “sub-database”

which consists of the set of frequent items co-occurring with the suffix pattern),

constructs its (conditional) FP-tree, and performs mining recursively with such a

tree. The pattern growth is achieved via concatenation of the suffix pattern with

the new ones generated from a conditional FP-tree. Since the frequent item set in

any transaction is always encoded in the corresponding path of the frequent-

pattern trees, pattern growth ensures the completeness of the result. FP-growth, is

used for efficient mining of frequent patterns in large databases.

Algorithm of FP-Growth:

Input: A database DB, represented by FP-tree constructed and a minimum

support threshold .

Output: The complete set of frequent patterns.

Method: call FP-growth(FP-tree, null).

Procedure FP-growth(Tree, a) {

1) if Tree contains a single prefix path then // Mining single prefix-path FP-tree {

2) let P be the single prefix-path part of Tree;

3 let Q be the multipath part with the top branching node replaced by a null root;

4) for each combination (denoted as ß) of the nodes in the path P do

5) generate pattern ß ∪ a with support = minimum support of nodes in ß;

6 )let freq pattern set(P) be the set of patterns so generated;}

7) else let Q be Tree;

Page 15: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

15

8) for each item ai in Q do { // Mining multipath FP-tree

9) generate pattern ß = ai ∪ a with support = ai .support;

10) construct ß’s conditional pattern-base and then ß’s conditional FP-tree Tree ß;

11) if Tree ß ≠ Ø then

12)call FP-growth(Tree ß , ß);

13) let freq pattern set(Q) be the set of patterns so generated;}

14) return(freq pattern set(P) ∪ freq pattern set(Q) ∪ (freq pattern set(P) × freq

pattern set(Q)))}

When the FP-tree contains a single prefix-path, the complete set of frequent

patterns can be generated in three parts: the single prefix-path P, the multipath Q,

and their combinations (lines 01 to 03 and 14). The resulting patterns for a single

prefix path are the enumerations of its sub paths that have the minimum support

(lines 04 to 06). Thereafter, the multipath Q is defined (line 03 or 07) and the

resulting patterns from it are processed (lines 08 to 13). Finally, in line 14 the

combined results are returned as the frequent patterns found.

Advantages:

Uses Divide and conquer strategy.

Uses Compact data structure.

Eliminates repeated database scan.

It is faster than other association mining algorithms.

The algorithm reduces the total number of candidate item sets by producing a

compressed version of the database in terms of an FP tree.

Disadvantages:

FP tree may not fit in memory.

FP tree is expensive to build.

Page 16: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

16

CHAPTER 3

EXISTING SYSTEM

The existing system uses Apriori algorithm which uses iterative level wise search.

It is an algorithm for frequent item set mining and association rule learning over

transactional databases. It proceeds by identifying the frequent individual items in

the database and extending them to larger and larger item sets as long as those

item sets appear sufficiently often in the database. This increases the execution

time.

3.1 Input System

3.1.1 Web Log Files

A log file is a file in which every page request made to the web server is recorded.

IP address of the computer making the request.

User ID, (this field is not used in most cases).

Date and time of the request.

Size of the file transferred.

Referring URL, that is, the URL of the page which contains the link that

generated the request.

Name and version of the browser being used.

Fig 3.1.1.1: Web log files

Page 17: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

17

3.1.2 Output System

Web log files can be used to reconstruct the user navigation sessions

within the site from which the log data originates.

The output system mainly focuses on generation of reports.

These reports act as :

Source of information required (Personalization).

Permanent hard copy of the results.

Fig 3.1.2.1: Data extracted from web log files

Page 18: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

18

CHAPTER 4

PROPOSED SYSTEM

The major drawbacks of the existing system are high execution time and excess

memory usage. In FP growth algorithm, it takes more time for recursive calls and

is good only when user access paths are common. Also it consumes more

memory. Thus we propose a combination of FP Growth and Apriori algorithm to

make the most of the all the advantages of both these algorithms and efficiently

overcome the drawbacks of existing system.

Modules:

1. Manage Users:- In this module admin manages the users. Which user is regular

and which is not.

2. Manage Web log File:- In this module admin manages the usage of the users

like which user visits which links and pages.

3. Data Preprocessing:- In this module system will remove unwanted data like

less visited links and pages.

4. Pattern discovery (Apriori Algorithm):- In this module, the system applies

Apriori algorithm on the web log file.

5. Pattern Analysis:- In this module system predict that the user is interested in

which domain of interest.

6. Result:- This module provides the links which will satisfy users

requirements(Which will very useful to the users).

Project Significance

Generally, this project will produce the useful finding for analyzing the Web

usage pattern for ELearning:

Page 19: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

19

This study will become the first step for the analyzing E-Learning portal by

applying Web usage mining approach with basic Association Rules –Apriori

algorithm and FP growth Algorithm.

i. The outcomes from this study can be used by the Web administrator in

order to plan necessary improvement, enhancement and valuable actions

to the E-Learning portal.

ii. The implementation of Web usage mining process for E-Learning portal

may becomes the guide line for the system development purposes.

Page 20: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

20

CHAPTER 5

DESIGN DETAILS

5.1 System Development Life Cycle:

The System Development Life Cycle is the process of developing

information systems through investigation, analysis, design, implementation, and

maintenance[7]. The System Development Life Cycle (SDLC) is also known as

Information Systems Development or Application Development.

Fig. 5.1.1 SDLC

Fig. 5.1.2 Gantt Chart for SDLC

Page 21: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

21

5.2 Steps involved in the System Development Life Cycle:

Below are the steps involved in the System Development Life Cycle.

Each phase within the overall cycle may be made up of several steps.

Step 1: Software Concept

The first step is to identify a need for the new system. This will include

determining whether a business problem or opportunity exists, conducting a

feasibility study to determine if the proposed solution is cost effective, and

developing a project plan.

This process may involve end users who come up with an idea for improving their

work. Ideally, the process occurs in tandem with a review of the organization's

strategic plan to ensure that IT is being used to help the organization achieve its

strategic objectives. Management may need to approve concept ideas before any

money is budgeted for its development.

Step 2: Requirements Analysis

Requirements analysis is the process of analyzing the information needs of

the end users, the organizational environment, and any system presently being

used, developing the functional requirements of a system that can meet the needs

of the users. Also, the requirements should be recorded in a document, email,

user interface storyboard, executable prototype, or some other form.

The requirements documentation should be referred to throughout the rest of the

system development process to ensure the developing project aligns with user

needs and requirements. Professionals must involve end users in this process to

ensure that the new system will function adequately and meets their needs and

expectations.

Step 3: Architectural Design

Page 22: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

22

After the requirements have been determined, the necessary specifications

for the hardware, software, people, and data resources, and the information

products that will satisfy the functional requirements of the proposed system

can be determined. The design will serve as a blueprint for the system and helps

detect problems before these errors or problems are built into the final system.

Professionals create the system design, but must review their work with the users

to ensure the design meets users' needs.

Step 4: Coding and Debugging

Coding and debugging is the act of creating the final system. This step is

done by software developer.

Step 5: System Testing

The system must be tested to evaluate its actual functionality in relation to

expected or intended functionality. Some other issues to consider during this

stage would be converting old data into the new system and training employees to

use the new system. End users will be key in determining whether the developed

system meets the intended requirements, and the extent to which the system is

actually used.

Step 6: Maintenance

Inevitably the system will need maintenance. Software will definitely

undergo change once it is delivered to the customer. There are many reasons for

the change. Change could happen because of some unexpected input values into

the system. In addition, the changes in the system could directly affect the

software operations. The software should be developed to accommodate changes

that could happen during the post implementation period. There is various

software process models like:-

Prototyping Model

RAD Model

Page 23: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

23

The Spiral Model

The Waterfall Model

The Iterative Model

5.3 Waterfall model

Software process model deals with the model which we are going to use for the

development of the project. There are many software process models available but while

choosing it we should choose it according to the project size that is whether it is industry

scale project or big scale project or medium scale project.

Accordingly the model which we choose should be suitable for the project as the

software process model changes the cost of the project also changes because the steps in

each software process model varies.This software is build using the waterfall mode. This

model suggests work cascading from step to step like a series of waterfalls. It consists of

the following steps in the following manner.

Fig. 5.3.1 Waterfall model

Analysis Phase: To attack a problem by breaking it into sub-problems. The objective of

analysis is to determine exactly what must be done to solve the problem.

Typically, the system’s logical elements (its boundaries, processes, and data) are

defined during analysis.

Page 24: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

24

Design Phase: The objective of design is to determine how the problem will be solved.

During design the analyst’s focus shifts from the logical to the physical. Data

elements are grouped to form physical data structures, screens, reports, files,

and databases.

Coding Phase: The system is created during this phase. Programs are coded, debugged,

documented, and tested. New hardware is selected and ordered. Procedures are

written and tested. End-user documentation is prepared. Databases and files are

initialized. Users are trained.

Testing Phase: Once the system is developed, it is tested to ensure that it does what it was

designed to do. After the system passes its final test and any remaining problems

are corrected, the system is implemented and released to the user. All these phases

are described with respect to the project in the rest of the document.

5.4 Data Flow Diagram

A data flow diagram (DFD) is a graphical representation of the "flow" of data

through an information system, modelling its process aspects. A DFD is often

used as a preliminary step to create an overview of the system, which can later be

elaborated. DFDs can also be used for the visualization of data

processing (structured design). A DFD shows what kind of information will be

input to and output from the system, where the data will come from and go to, and

where the data will be stored. It does not show information about the timing of

processes, or information about whether processes will operate in sequence or in

parallel (which is shown on a flowchart).

Page 25: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

25

fig. 5.4.1 User DFD

Fig. 5.4.2 User UserCase

Page 26: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

26

Fig. 5.4.3 User Flowchart

Page 27: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

27

CHAPTER 6

IMPLEMENTATION PLAN

Phase 1:

Activity Description Effort in

person

weeks

Deliverable

Phase 1

P1-01 Requirement Analysis 2 weeks Requirement Gathering

P1-02 Existing System Study &

Literature

3 weeks Existing System Study &

Literature

P1-03 Technology Selection 2 weeks >NET

P1-04 Modular Specifications 2 weeks Module Description

P1-05 Design & Modeling 4 weeks Analysis Report

Total 13 weeks

Phase2:

Activity Description Effort in

person

weeks

Deliverable

Phase 2

P2-01 Detailed Design 2 weeks LLD / DLD Document

P2-02 UI and user interactions

design

Included in

above

UI document

P2-03 Coding & Implementation 12 weeks Code Release

P2-04 Testing & Bug fixing 2 weeks Test Report

P2-05 Performance Evaluation 4 weeks Analysis Report

P2-06 Release Included in

above

System Release

Total 20 weeks Deployment efforts are extra

Page 28: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

28

Gantt Charts

The Gantt Chart shows planned and actual progress for a number of tasks

displayed against a horizontal time scale.

It is effective and easy-to-read method of indicating the actual current status for

each of set of tasks compared to planned progress for each activity of the set.

Gantt Charts provide a clear picture of the current state of the project.

Planned Gantt Chart

Table: Planned Gantt Chart

Page 29: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

29

CHAPTER 7

ANALYSIS

FEASIBILITY STUDY

The very first phase in any system developing life cycle is preliminary

investigation. The feasibility study is a major part of this phase. A measure of how

beneficial or practical the development of any information system would be to the

organization is the feasibility study.The feasibility of the development software

can be studied in terms of the following aspects:

Operational Feasibility.

Technical Feasibility.

Economical feasibility.

OPERATIONAL FEASIBILITY

The Application will reduce the time consumed to maintain manual

records and is not tiresome and cumbersome to maintain the records. Hence

operational feasibility is assured.

TECHNICAL FEASIBILITY

Minimum hardware requirements:

1.66 GHz Pentium Processor or Intel compatible processor.

1 GB RAM.

Internet Connectivity.

80 MB hard disk space.

ECONOMICAL FEASIBILTY

Once the hardware and software requirements get fulfilled, there is no

need for the user of our system to spend for any additional overhead. For the user,

the Application will be economically feasible in the following aspects:

The Application will reduce a lot of labour work. Hence the Efforts

will be reduced.

Page 30: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

30

Our Application will reduce the time that is wasted in manual

processes.

The storage and handling problems of the registers will be solved.

7.1 DETAILS OF HARDWARE AND SOFTWARE

.NET Framework

The .NET Framework is an environment for building, deploying, and running

XML Web services and other applications. It is the infrastructure for the overall

.NET platform. The .NET Framework consists of three main parts: the common

language runtime, the class libraries, and ASP.NET.

Why C#?

C# is the new language with the power of C++and the slickness of Visual Basic.

It cleans up many of the syntactic peculiarities of C++ without diluting much of

its flavour (thereby enabling C++ developers to transition to it with little

difficulty).And its superiority over VB6 in facilitating powerful OO

implementations is without question.

7.2 BACK-END: Microsoft SQL Server

Business today demands a different kind of data management

solution. Performance, scalability, and reliability are essential, but

businesses now expect more from their key IT investment. SQLServer 2005

exceeds dependability requirements and provides innovative capabilities that

increase employee effectiveness, integrate heterogeneous IT ecosystems, and

maximize capital and operating budgets. SQL Server 2005 provides the

enterprise data management platform your organization needs to adapt

quickly in a fast-changing environment.With the lowest implementation and

maintenance cost in the industry, SQL Server 2005 delivers repaid return on your

Page 31: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

31

data management investment. SQL Server 2005 supports the rapid development

of enterprise-class business application that can give your company a critical

competitive advantage. Benchmarked for scalability, speed, and performance,

SQL Server 2005 is a fully enterprise-class database product, providing core

support for Extensible Markup Language (XML) and Internet queries.

Page 32: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

32

CHAPTER 8

CONCLUSION

Thus the proposed work is an efficient algorithm for generating frequent access

patterns from the access paths of the users. This algorithm is optimized to take

less time compared to the existing algorithms and store the access paths in the

compressed format. The main aim of this algorithm is to reduce execution time

and memory utilization as compared to the existing algorithms viz. Apriori

algorithm. The frequent access patterns show the sequence of web pages which

are frequently navigated by the user. The proposed Algorithm is not only

generating any candidate sets, but also more number of patterns will be generated,

due to this the number of tree traversals will be more.

Information content on the WWW is increasing at an exponential rate and it is not

surprising to find users having difficulty in navigation and finding relevant

information. Hence, the e-commerce site developers find it difficult to observe

potential customers or web site structure. We are thus making an attempt to

improvise the existing algorithms and bring web mining to a new level.

Page 33: BY, GIRIJA PATIL PRIYANKA PATKAR AANUM SHAIKH ADITI …webpage.pace.edu/at64915n/report.pdf · Dr.Udhav Bhosle Internal Examiner External Examiner . Web usage mining in Business Intelligence

Web usage mining in Business Intelligence

33

References

[1]B.Santhosh Kumar, K.V.Rukmani, “Implementation of Web Usage Mining

Using Apriori and FP Growth Algorithm”, Int. J. of Advanced Networking and

Applications, Volume: 01, Issue: 06, (2010),

[2]Mishra Rahul, ChoubeyAbha, “Discovery of Frequent Patterns from Web Log

Data by using FP-Growth Algorithm for Web Usage Mining”, International

Journal of Advanced Research in Computer Science and Software Engineering,

Vol.2, pp.311-318,2012.

[3]Han J., Pei J., Yin Y. and Mao R., “Mining frequent patterns without candidate

generation: A frequent-pattern tree approach” Data Mining and Knowledge

Discovery, 2004.

[4]Baglioni M., Ferrara U., Romei A., Ruggieri S., and Turini F., (2003).

Preprocessing and Mining Web Log Data for Web Personalization. In

Proceedings of the 8th Italian Conference on Artificial Intelligence, LNCS Vol.

2829, pp. 237-249.