3 iaetsd semantic web page recommender system

8
SEMANTIC WEB-PAGE RECOMMENDER SYSTEM P.Vinothini, T.vetriselvi Department of cse, K. Ramakrishna College of technology, Trichy. E-Mail:[email protected] Abstract -With the explosive growth of internet, large number of users are doing online search to satisfy their information need. Web Usage Mining plays an important role in discovering knowledge representing the online user’s behaviour from the available web log data. Satisfying online user’s need by the traditional web usage mining system is a challenging task as it solely constructed by the web usage data of online users. Web-page recommendation is used to effectively capture intuition of online users. In order to make Web-page recommendation system to accurately capture the intuition of the users, we proposed two novel knowledge representation models to provide semantic enhancement to the web-page recommender system. The first model, namely semantic network of a website, which represents domain knowledge by domain terms, Web-page and relations between them. Web Usage model generates the frequent web access patterns by sequential pattern mining algorithms based on the usage data from the web server. The second model, namely Conceptual Prediction Model (CPM), which integrates the semantic knowledge with the web usage model resulting in weighted semantic network of semantic web usage knowledge. CPM constructs weighted semantic network with the Frequently Viewed Terms as nodes, where weight represents the probability of transition between adjacent terms, using Markov models. Index terms: Web Usage mining, semantic knowledge, conceptual prediction model, semantic network, domain terms. 1. INTRODUCTION: Web Mining is the major area in data mining applications which discover patterns from the web data, in order to better understand the needs of web-based applications. Web mining can be divided into three different types, which are web usage mining, web content mining and web structure mining. Web Usage Mining (WUM) is the process of discovering or extracting patterns from the user’s access data in the web. Usage data of the user is collected from one or more Web servers. Web usage mining is very useful in understanding the user’s interests and their network behaviours. A typical application of WUM is represented by the recommender system. The main goal of a Web-page recommender system is to effectively forecast the Web-page(s) that will be visited next while user navigating through the website. Web-Page recommendation is a system that captures intuition of online users by their browsing patterns and recommending those to users in the form of links to stories, books, or interested pages. There are lot of difficulties in developing an effective Web-page recommender system, such as how to effectively learn the user’s online behaviour and Web-page navigation patterns from available historical usage data and, how to discover these knowledge, and how to make online recommendations system based on the discovered knowledge. In order to efficiently represent Web access sequences (WAS) from the Web usage data, some studies shown that approaches based on the tree structures and probabilistic models are used [1]. These approaches are using the historical web usage data and construct user profile, which consist of links between Web-pages that user are mostly interested, based only on the usage data. By using this knowledge, when user comes online for the next time, they predict next Web-page(s) that user most likely to visit, given the current Web-page and previously visited k- Web pages. The performance of these approaches depends on the sizes of training usage datasets. The bigger the training dataset size is, the higher the prediction accuracy is. The main drawback of these Web-page recommendations are that they solely based on the Web access sequences learnt from the Web usage data. Therefore, if a user is visiting a new Web-page that is not in the training usage data, then these approaches does not offer any recommendations to this user. This problem is referred to as “new-item problem”. Some studies are showing that semantic- enhanced approaches are used to overcome these new- item problem [2],[3] by using domain ontology. Integrating domain knowledge with Web usage knowledge improves the prediction accuracy of the recommender systems using ontology based Web mining techniques [4]–[6].Web usage mining enriched with semantic information showed higher performance than classic Web usage mining algorithms [5]-[6]. However, the main issue in these approaches are the problem facing in representing and acquiring the semantic domain knowledge. A lot of researches are going in this domain ontology. The domain ontology are mostly used to represent the semantics of a website, which can be constructed manually by experts or automatically by learning models, such as the Bayesian network or a collocation map, for many different applications. Given the very large size of Web data in today’s websites, building ontology manually for a website is challenging task and they are time consuming and less reusable. According to Stumme, Hotho and Berendt, it is ISBN: 978-81-930654-7-5 www.iirdem.org Proceedings of ICEEM-2016 ©IIRDEM 2016 14

Upload: iaetsd-iaetsd

Post on 22-Jan-2018

74 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: 3 iaetsd semantic web page recommender system

SEMANTIC WEB-PAGE RECOMMENDER SYSTEM P.Vinothini, T.vetriselvi

Department of cse,

K. Ramakrishna College of technology, Trichy.

E-Mail:[email protected]

Abstract -With the explosive growth of internet, large number of users are doing online search to satisfy their information need.

Web Usage Mining plays an important role in discovering knowledge representing the online user’s behaviour from the available

web log data. Satisfying online user’s need by the traditional web usage mining system is a challenging task as it solely constructed

by the web usage data of online users. Web-page recommendation is used to effectively capture intuition of online users. In order

to make Web-page recommendation system to accurately capture the intuition of the users, we proposed two novel knowledge

representation models to provide semantic enhancement to the web-page recommender system. The first model, namely semantic

network of a website, which represents domain knowledge by domain terms, Web-page and relations between them. Web Usage

model generates the frequent web access patterns by sequential pattern mining algorithms based on the usage data from the web

server. The second model, namely Conceptual Prediction Model (CPM), which integrates the semantic knowledge with the web

usage model resulting in weighted semantic network of semantic web usage knowledge. CPM constructs weighted semantic network

with the Frequently Viewed Terms as nodes, where weight represents the probability of transition between adjacent terms, using

Markov models.

Index terms: Web Usage mining, semantic knowledge, conceptual prediction model, semantic network, domain terms.

1. INTRODUCTION:

Web Mining is the major area in data mining

applications which discover patterns from the web data,

in order to better understand the needs of web-based

applications. Web mining can be divided into three

different types, which are web usage mining, web content

mining and web structure mining. Web Usage Mining

(WUM) is the process of discovering or extracting

patterns from the user’s access data in the web. Usage

data of the user is collected from one or more Web

servers. Web usage mining is very useful in

understanding the user’s interests and their network

behaviours. A typical application of WUM is represented

by the recommender system.

The main goal of a Web-page recommender

system is to effectively forecast the Web-page(s) that will

be visited next while user navigating through the website.

Web-Page recommendation is a system that captures

intuition of online users by their browsing patterns and

recommending those to users in the form of links to

stories, books, or interested pages. There are lot of

difficulties in developing an effective Web-page

recommender system, such as how to effectively learn the

user’s online behaviour and Web-page navigation

patterns from available historical usage data and, how to

discover these knowledge, and how to make online

recommendations system based on the discovered

knowledge.

In order to efficiently represent Web access

sequences (WAS) from the Web usage data, some studies

shown that approaches based on the tree structures and

probabilistic models are used [1]. These approaches are

using the historical web usage data and construct user

profile, which consist of links between Web-pages that

user are mostly interested, based only on the usage data.

By using this knowledge, when user comes online for the

next time, they predict next Web-page(s) that user most

likely to visit, given the current Web-page and previously

visited k- Web pages.

The performance of these approaches depends on

the sizes of training usage datasets. The bigger the

training dataset size is, the higher the prediction accuracy

is. The main drawback of these Web-page

recommendations are that they solely based on the Web

access sequences learnt from the Web usage data.

Therefore, if a user is visiting a new Web-page that is not in

the training usage data, then these approaches does not offer any

recommendations to this user. This problem is referred to as

“new-item problem”.

Some studies are showing that semantic-

enhanced approaches are used to overcome these new-

item problem [2],[3] by using domain ontology.

Integrating domain knowledge with Web usage

knowledge improves the prediction accuracy of the

recommender systems using ontology based Web mining

techniques [4]–[6].Web usage mining enriched with

semantic information showed higher performance than

classic Web usage mining algorithms [5]-[6]. However,

the main issue in these approaches are the problem facing

in representing and acquiring the semantic domain

knowledge. A lot of researches are going in this domain

ontology.

The domain ontology are mostly used to

represent the semantics of a website, which can be

constructed manually by experts or automatically by

learning models, such as the Bayesian network or a

collocation map, for many different applications. Given

the very large size of Web data in today’s websites,

building ontology manually for a website is challenging

task and they are time consuming and less reusable.

According to Stumme, Hotho and Berendt, it is

ISBN: 978-81-930654-7-5

www.iirdem.org

Proceedings of ICEEM-2016

©IIRDEM 201614

SIVUDU1
Rectangle
SIVUDU1
Typewriter
V
SIVUDU1
Typewriter
1,2
SIVUDU1
Typewriter
1
SIVUDU1
Typewriter
2
SIVUDU1
Rectangle
SIVUDU1
Typewriter
Page 2: 3 iaetsd semantic web page recommender system

impossible to manually discover the meaning of all Web-

pages and their usage for a large scale website [10].

Automatic construction of ontologies saves time and

discovers all possible concepts within a website and links

between them, and they are reusable. However, the

drawback of this automatic approach is the need to

design and implement the learning models which can

only be done by professionals at the beginning.

This paper presents a novel method to provide

better Web-page recommendation by integrating Web

usage and domain knowledge. Two new knowledge

representation models and a set of Web-page

recommendation strategies are proposed in this paper.

The first model is a semantic network that represents

domain knowledge, which can be constructed

automatically. As it is fully automated, it can be easily

integrated with the Web-page recommendation process.

The second model is a conceptual prediction model,

which is a navigation network of domain terms based on

the frequently viewed Web-pages. This represents the

integrated Web usage and domain knowledge which

supports Web-page prediction and it can also be

constructed automatically. The proposed

recommendation strategies predict the next pages with

probabilities for a given Web user based on his or her

current Web-page navigation state through these two

models. This new method has automated the knowledge

base construction and alleviated the new-item problem.

This method yields better performance compared with

the existing Web usage based Web-page recommendation

systems.

This paper is structured as follows: Section 2

discusses about the related works; Section 3 briefs the

architecture diagram and the implementation of web

usage mining. Section 4 presents the first model, i.e. a

semantic network of domain terms. Section 5 presents the

second model, i.e. a conceptual prediction model (i.e.

integrating the semantic knowledge with the web-page

recommendation). For each of the models presented in

Sections 4-6, the corresponding queries that are used to

retrieve semantic information from the knowledge

models have been presented. Section 6 presents a set of

recommendation strategies based on the queries to make

semantic-enhanced Web-page recommendations.

2. LITERATURE SURVEY: Research work related to the web-page

recommender system that combines the web usage

mining with the semantic knowledge is very limited. So

they can be classified by the following two approaches:

2.1Traditional Usage Based Approaches Analog is the first Web Usage Mining systems. It

consists of two components: offline and online. In offline

phase, they construct the session clusters that exhibit

similar information from their usage data collected from

the web server. Then the online phase predicts which

cluster the current user may fall by their active user

sessions and suggest the list of pages which are related to

the current session. This approach has several drawbacks:

mainly scalability and accuracy. SUGGEST 1.0 [21] was

proposed as a two-tier system composed of off-line

module which analyse the Web server’s access log file,

and an online classification module which carried out the

second stage. Its main drawback was the asynchronous

cooperation between the two modules. In the next

version, SUGGEST 2.0, the two modules were merged to

perform the same operations but in a complete online

fashion. This results in the problem of estimating the

update frequency of the knowledge base. Potential

limitation of SUGGEST 2.0 might be: a) the memory

required to store Web server pages is quadratic in the

number of pages. b) it does not permit us to manage

Websites made up of pages dynamically generated.

Bamshad Mobasher et al. [19] presented WebPersonalizer,

a system that provides dynamic recommendations as a

list of hypertext links to users. The method is based on

anonymous usage data combined with the Web site

structure. F. Masseglia et al. [20] proposed an integrated

system, WebTool, which is based on sequential patterns

and association rules extraction to dynamically customize

the hypertext organization. The current user's behaviour

is compared with previously induced sequential patterns

and navigational hints are provided to the user. In

traditional web recommendation system [2], Sequential

mining is effectively used to discover the web access

patterns, particularly tree structures and markov models

are used. WAP-Tree is a tree structure used for holding

access sequences in a very compact form to enable access

pattern mining. In [7], they proposed the PLWAP-Mine,

which use the PLWAP tree structure to incrementally

update web sequential access patterns efficiently without

scanning the whole database even when previous small

items become frequent. The position code features of the

PLWAP tree are used to efficiently mine these trees to

extract current frequent patterns when the database is

updated. FOL-Mine is an efficient sequential pattern

mining algorithm proposed in [8]. It is based on the

concept of WAP-tree but uses a special linked structure to

hold access sequences for processing and proved to be

efficient. FOL-mine is proved better than all existing

WAP-tree mining methods. FOL-list is used to hold the

first occurrence information of items during the mining

of patterns in the intermediate projected databases. This

manages the suffix building very efficiently. The node

structure suggested in [14] is modified to process the

weighted support of sequences. Based on the study [9],

weighted sequential pattern mining is better than all non-

weighted sequential pattern mining (eg: FOL-Mine) by

ISBN: 978-81-930654-7-5

www.iirdem.org

Proceedings of ICEEM-2016

©IIRDEM 201615

Page 3: 3 iaetsd semantic web page recommender system

giving weights to the item in Web Access Sequence

Database (WASD).They use the modified form of the

structure used in [8] and are enhanced by holding weight

information of the item. This method needs only one

database scan to generate weighted list structure.

2.2 semantic-Enhanced Approaches A lot of research reported that Web-Page

recommendation can made more accurate by integrating

the web usage knowledge with the domain knowledge.

In [11]-[12], domain ontology of the websites is used to

improve the recommendation process. In [11], Liang Wei

and Song Lei used ontology, which includes concepts and

significant terms extracted from documents, to represent

a website’s domain knowledge. They generate online

recommendations by semantically matching and

searching for frequent pages discovered from the Web

usage mining process. This approach showed higher

precision rates, coverage rates and matching rates. In

[6],[13] ontology reasoning are used, where Web access

sequences are converted into sequences of ontology

instances, to make recommendation. In these studies, the

Web usage mining algorithms find the frequent

navigation paths in terms of ontology instances rather

than normal Web-page sequences. In [14], they proposed

SWUM (Semantically enriched Web Usage Mining)

method which incorporate the semantic data and site

structure with the solely usage data based WebPUM

method. WebPUM represents usage data by means of

adjacency matrix and induces the navigation patterns

using a graph partitioning technique, which is then

enriched with the semantic data of the website. The

semantic metadata extracted takes into account both the

semantics in a page contents and the semantic

relationship in the Web pages. The semantic similarity is

represented in terms of a semantic similarity matrix that

gives the similarity score between every pair of Web

pages. Thus, the semantic similarity matrix is combined

with the adjacency matrix in order to derive the

semantically enriched weight matrix, and the resulting

navigation patterns are fed into recommendation engine.

The drawback is that the system is suitable only for

statically generated web-pages of the website. In [15],

frequent sequential patterns are enriched with semantic

information, which are expressed in terms of ontology

instances instead of web page sequences, are used for

recommending subsequent pages to the user. The

discovered Semantic rich sequential association rules

form the core knowledge of the recommendation engine

of the proposed model. The vision of a Semantic Web has

recently drawn attention both from academic and

industrial circles. The incorporation of semantic Web for

generating personalized Web experience is to improve

the results of Web mining by exploiting the new semantic

structures [2]. As a consequence, there is an increasing

effort in defining Web pages and objects in terms of

semantic information by using ontology. In [2], the first

part covers how the content and the structure of the site

can be leveraged to transform raw usage data into

semantically-enhanced transactions which is then used

for semantic Web usage mining and personalization. The

second part presents a framework for more

systematically integrating full-fledged domain ontologies

in the personalization process. In [12], the proposed

system is domain-independent, is implemented as a Web

service, and uses both explicit and implicit feedback-

collection methods to obtain information on user’s

interests. Domain-based method makes inferences about

user’s interests and a taxonomy-based similarity method

is used to refine the item-user matching algorithm,

improving recommendation prediction.

3 ARCHITECTURE OF WEB-PAGE

RECOMMENDER SYSTEM: The implementation of the

recommendation system is taken place in two

components: offline and online. Offline component

builds the knowledge base by analysing the historical

data, such as server access log file or web logs which are

captured from the server, then these web logs are used in

the online component for capturing intuition list of the

user so as to recommend page views to the user whenever

user comes online for the next time. Data collection, data

pre-processing, pattern discovery and pattern analysis

are the steps to be taken in web usage mining in offline

phase.

3.1 Data Collection: Data collection is the first step in web usage

mining. Web usage data are collected from the three main

sources: Web servers, proxy servers and client-side

requests. In [17],Cooley and Mobasher reported that large

information reside only in server log files and it is difficult

to get the data from proxy servers and from client side

browsing, So we use the server log files as a primary data

sources. There are several types of log files. IIS web log

consists of 17 attribute, each represents data in records.

The fragment of IIS web log:

3.2 Data Pre-processing:

Generally, data cleaning, identification of user,

session and path completion are various steps involved

in pre-processing.

3.2.1. Data Cleaning:

The data cleaning task removes the log entries

which are irrelevant and redundant. There are two kinds

of irrelevant data need to be removed:

ISBN: 978-81-930654-7-5

www.iirdem.org

Proceedings of ICEEM-2016

©IIRDEM 201616

Page 4: 3 iaetsd semantic web page recommender system

i. Files having suffixes such as .jpeg, .gif, .css, .cgi,

etc., which can be found in cs_uri_stem field of

IIS log.

ii. Error request, which can be found in sc_status

field.

Once pre-processing done, data from multiple sources are

transformed into an acceptable form, which serves as an

input to various mining processes.

3.2.2. User Identification: The user identification process is to distinguish

the different users from the web access log file. Referrer-

based method is used for this process. It is complex task

due to the presence of resident caches and proxy servers.

We have the following heuristics [18] used to identify the

user:

1) Each IP address represents one user;

2) If the IP address is same for more logs, but the

agent field shows changes in browser or OS , then

IP address represents a different user;

3) If all the above fields are same, then referrer

information can be considered. If a user

requested page is not directly accessible by a link

from any of these pages, hence with the same IP

there is another user.

3.2.3. Session Identification The aim of the user session identification is

to find out the different user sessions from the web access

log file. The user session identification involves - dividing

the page accesses of every user into separate sessions. We

have the methods to identify user session based on

timeout mechanism and maximal forward reference. In

[18], following rules are used to identify user session:

1) If there is a new user, and hence, there is a new

session;

2) If the referrer page is null in one user session,

there is a new session;

3) If the time frame between page requests

exceeds a limit, then user is starting a new session.

3.2.4. Path Completion

Due to the presence of proxy server and local

cache, some user accesses will not be recorded in the

access log. The path completion is used to acquire

complete user access path by filling up the missing page

references. The incomplete access path is recognized by

checking the link for the user requested page and last

page. If it is unlinked and that page is already in the user’s

history, then it is clear that back button is used by the

user. By these methods, complete path is acquired. Web

log pre-processing helps in removing unwanted data

from the log file and reduces the original file size by 50-

55%.

Figure 1: Architecture Of Web Usage Mining

integrated With Semantic Knowledge

3.3 Pattern Discovery:

Once user transactions have been identified, the

web logs are converted into relational databases and

then sequential pattern mining are performed on data for

discovering Frequent Web Access Patterns (FWAP).

In this paper, we used LL-Mine algorithm, which

is a modified form of the structure in [9] for Sequential

pattern mining as it is efficient compared to all other

existing algorithm, which produces frequent web access

sequences in Linked List data structure. This scans the

database and produces frequent item sets which satisfy

the weighted support. Usually, only the order of Web-

page is taken into consideration in Sequential pattern

mining. In order to give the importance to the Web-page,

time visited by the user and the frequency of visit both

are taken into account to assign the weight to the Web-

page while generating web patterns using W_ASSIGN

algorithm. The weight support of the access sequence, s is

given by [9]:

Weight_support(s) = g_support(s) x weight(s)

Where,

Weight(s) is calculated from the average weight of the

items in the sequence, and

g_support(s) is the support of the sequence in the WASD.

Frequent patterns are generated by this algorithm and are

used to integrate with the semantic knowledge by

crawling all the URL of these FWAP to collect domain

term sequences.

ISBN: 978-81-930654-7-5

www.iirdem.org

Proceedings of ICEEM-2016

©IIRDEM 201617

Page 5: 3 iaetsd semantic web page recommender system

TABLE 1: Algorithm W_ASSIGN

ALGORITHM: W_ASSIGN

Input: An access sequence database, WASD A support threshold

Output: Set of weighted access patterns

Method: 1. For each web access sequence s=p1,p2,….,pn

Set weight (pi) =0; Let length =0; Create linked list C, where node containing item name and their weight;

Set weight to 0; For each occurrence of item pi ,

Increment freq (pi) and add Time (pi); Update the values in C; End for; Update the list of items in LIN with the C

For each pi, Update Take harmonic mean of freq(pi) and Time(pi); Assign it to weight (pi); {End for} 2. For each item pi in LIN, check whether it passes the Support threshold, add the item into frequent pattern 3. Call LL-Mine 4. Return

TABLE2: Algorithm for LL-Mine

Algorithm: LL-Mine Parameters:

Current frequent pattern, p List of fist occurrence, L Absolute support, η

Method: 1. for each weighted frequent item, pi

i. generate the first occurrences list, L1, Initialize L1 with Weight_support=0; Locate the first occurrences of the element p in projected databases D-p using L; Generate L1 with node holding seq-id and pos; Add the weight of the item at each occurrence; Update the header of the list L1 with Weight_support (pi); ii. If the Weight_Support (pi) > η

Add p.pi to F, set of pattern Add p.pi to stack for suffix building. p= p.pi Call LL-Mine (p, L1, η) {End if}

iii Delete the current L. {End for} 2. Return

3.4 semantic network construction:

This section presents the first model, i.e.

Semantic network of a website and their schema and

explains the queries to infer the terms and webpages.

Semantic network is a kind of knowledge map which

represents concepts as domain terms and Web-pages, and

relations between the concepts. To construct the semantic

network, domain terms are collected from the Web-page

titles and then we extract the relations between these

terms by these two aspects: (i) the collocations of terms-

determined by the co-occurrence relations of terms in

Web-page titles; and (ii) the associations between terms

and webpages.

In order to know how these terms are

semantically related, the domain terms and co-occurrence

relations are weighted. Based on these relations, we can

guess how closely the Web-page is associated with each

other semantically. To infer the semantics of Web-pages,

we can query about the relations including relevant pages

and key terms for a given page, and the pages for given

terms, thereby achieving semantic enhanced Web-page

recommendations. This semantic network is considered

to be TermNetWP.

The following are the procedures to automatically

construct TermNetWP:

1) Collect the titles of visited Web pages.

2) Extract term sequences from the Web-page

titles.

3) Build the semantic network – TermNetWP.

4) Implement an automatic construction of

TermNetWP.

To reuse and share the domain term network by

Web-page recommender system, TermNetWP is

implemented in OWL. The input to this network is a term

sequence collection (TSC), in which each record consists

of:

1) The PageID of a Web-page d ∈D;

2) A sequence of terms X = t1 t2 . . . tm ∈ TS, m >0, extracted

from the title of the Web-page;

3) The URL of the Web-page.

3.5 Frequently Viewed Term Pattern (FVTP): In this paper, we used Web usage mining

technique, namely LL-Mine, to obtain the frequent Web

access patterns (FWAP).We integrate FWAP with

TermNetWP in order to result in a set of frequently

viewed term patterns (FVTP) which is the semantic Web

usage knowledge of a website.

The frequent web access pattern is described as follows:

P = {P1, P2 . . . Pn}: Set of FWAP

Where Pi = di1 di2 . . . dim: pattern showing sequence of Webpages,

n is the number of the patterns,

m is the number of Web-pages in the pattern.

The Frequently viewed term patterns is denoted as

follows:

F = {ti1 ti2 . . . tim }: Set of FVTP,

where each domain term pattern f is a sequence of domain

terms, in which each domain term tik is a domain term of page

dik in Pi.

ISBN: 978-81-930654-7-5

www.iirdem.org

Proceedings of ICEEM-2016

©IIRDEM 201618

Page 6: 3 iaetsd semantic web page recommender system

3.6 Conceptual Prediction Model (CPM)

Conceptual prediction model (CPM) is used to

automatically generate a weighted semantic network of

frequently viewed terms with the weight being the

probability of the transition between two adjacent terms

based on FVTP in order to obtain the semantic Web usage

knowledge that is efficient for semantic-enhanced Web-

page recommendation. This semantic network is referred

to as TermNavNet. We present two Web-pages recommendation

strategies, based on the semantic knowledge base of a

given website, through the semantic network of Web-

pages (TermNetWP) and the weighted semantic network

of frequently viewed terms of Web-pages within the

given website (TermNavNet). These recommendations

are named as semantic enhanced Web-page

recommendations.

4 TermNetWP ALGORITHM:

4.1 Definitions of TermNetWP

The notations used in TermNetWP are

summarized as follows:

TERMauto = {ti: 1 ≤ i ≤ p}: set of domain terms extracted

from Web-page titles;

D = {dj: 1 ≤ j ≤ q}: set of the Web-pages;

Xj = t1 t2 t3. . .tn tk : sequence of domain terms, which

may be duplicated, present in each page dj,

ti ẽ dj: Denotes ti is a domain term of dj.

tf (t, D): term frequency of t over D;

TS = {Xj: 1 ≤ j ≤ q}: set of domain term sequences, and

a pair of terms (ti, tj), ti, tj ∈ TERMauto;

ω (ti, tj): Number of times that ti is followed by tj in

TS, and there is no term between them.

The semantic network of Web-pages, namely

TermNetWP, is defined as a 4-tuples:

Netauto: =<T, A, D, R >, where

T = {(term, term frequency)}: Set of domain terms and

corresponding occurrences,

A= {(tx, ty, wxy): wxy= ω(tx, ty) >0}: Set of associations

between tx and ty with weight wxy,

R = {(t, d): t ẽ d}: domain term t is related to web-

page d by its presence in title page.

4.2 Schema of TermNetWP: In schema of TermNetWP, class Instance

represents domain term, i.e. t ∈TERMauto, which has two data

type Name, and iOccur, and one WPage object property.

The iOccur property refers to the count of occurrences of

term among the set of Web-page titles. Class WPage

represents Web-page, i.e. d ∈D, with properties Title,

PageID, URL and Keywords in the title. The Keywords

property defines the terms in a Web-page title. These two

classes are related through the ‘hasWPage’ relationship,

i.e.(t,d)∈R, from Instance to WPage, shows that a term

instance has one or more Web-pages; and ‘belongto-

Instance’ relationship, which is the inverse relationship of

‘hasWPage’, shows that a Web-page belongs to one or

more term instances. An association class OutLink is

defined to specify the in-out relationship between two

terms. Class OutLink is used for connecting from one term

instance (tx) to another term instance (ty), and defines the

corresponding connection weight (iWeight = wxy).

Figure 2: schema of TermNetWP

Class OutLink involves two object properties: (i) ‘from-

Instance’ defines one previous term instance, and (ii) ‘to-

Instance’ defines one next term instance. Class Instance

also has two object properties: (i) ‘hasOutLink’, which is

the inverse of ‘from-Instance’ relation, and (ii)

‘fromOutLink’, which is the inverse of ‘to-Instance’

relation.

4.3 Queries

Based on TermNetWP, we can query: (i) domain

terms for a given Web-page, and (ii) Web-pages mapped

to a given domain term.

4.3.1 Query about terms of a given Web-page:

Querytopic (d) = (t1, t2 . . . ts), where d ∈D; (ti, d) ∈R,

i = [1 . . . s]; tf (ti, D) >tf (tj, D), (i <j & 1 ≤ i, j ≤ s).

Using this query Querytopic (d), given Web-page d ∈D,

term instances that are associated with the WPage

instance dare retrieved via the ‘belongto-Instance’ object

property. Degree of occurrences of term in the domain is

taken into account and is returned in descending order.

The Connection weight between a page and a domain

term is defined as:

η(dj, t) = ∑ 𝜔(𝑡𝑘, 𝑡) + 𝜔(𝑡, 𝑡𝑘)𝑛

𝑘=0

Where n = | {tk: tk ẽ d}|: the number of domain terms in

the title of page d.

4.3.2 Query about pages mapped to a given term:

Querypage (t) = (d1, d2 . . . ds), where (t, di) ∈R,

i = [1 . . . s]; and η (di, t) < η(dj, t), (i <j&1 ≤ i, j ≤ s).

Using this query Querypage (t), given domain term t

∈TERMauto, WPage instances (i.e. web-pages) that are

ISBN: 978-81-930654-7-5

www.iirdem.org

Proceedings of ICEEM-2016

©IIRDEM 201619

Page 7: 3 iaetsd semantic web page recommender system

mapped to the term instance t are retrieved via

‘hasWPage’ object property. The returned pages are

sorted in ascending order of connection weights between

the Web-pages and domain term t to show the degree of

relevance to the term t.

TABLE 3:Algorithm forTermNetWP

Input: TSC(Term Sequence Collection) Output:G(TermNetWP)

Process: Let TSC = {PageID,X= t1t2 . . . tm , URL } Initialize G;Let R= root or the start node of G Let E= the end node of G For each PageID and each sequence X in TSC{ Initialize a WPage object identified as PageID

For each term ti ϵ X { If node ti is not found in G, then Initialize an Instance object I as a node of G Set I.Name =ti

Else Set I= the Instance object named ti in G Increase I.iOccur by 1 If (i==0) then Initialize an OutLink R-ti if not found Increase R-ti.iWeightby 1 Set R-ti fromInstance=R Set R-ti toInstance =I If (i>0 & i<m) then Get PreI =the Instance object with name ti-1

Initialize an OutLink ti-1-ti if not found Increase ti-1-ti.iWeight by 1 Set ti-1-ti.toInstance = I Set ti-1-ti.fromInstance = preI If (i==m) then Initialize an OutLink ti-E if not found Increase ti-E.iWeight by 1 Set ti –E.toInstance =E Set ti –E.fromInstance = I Set I.hasWPage = PageID Add term ti into PageID.Keywords

} }

5. TermNavNet ALGORITHM: In Section 4, we presented TermNetWP, which

represents the semantics of Web-pages within a website

efficiently but they are not sufficient for making effective

Web-page recommendations on their own. To overcome

this issue, we should integrate the TermNetWP with Web

usage knowledge to obtain the semantic Web usage

knowledge.

The notations used to represent the TermNavNet are

summarized as follows:

∂x: Number of occurrences of tx in F;

∂x, y: Number of times that tx followed by ty in F and there is no

term between them;

∂S,x :Number of times domain term tx is the first item in a

domain term pattern f ;

∂x,E: Number of times a domain term pattern f terminates at

domain term tx ;

∂x,y,z: Number of times that (tx, ty) followed by tz in F and there

is no term between them.

The probability of a transition is estimated by the ratio of

the number of times the corresponding sequence of states

(i.e. visited Web-page) was traversed and the number of

times the anchor state occurred. In our system, we take

into account first-order and second-order transition

probabilities.

Given a CPM having states {S, t1 . . . tp , E}, and N is the

number of term patterns in F, the first-order transition

probabilities are estimated according to the following

expressions:

Transition from the starting state S to state tx:

𝜌𝑆,𝑥 =𝜕𝑆,𝑥

∑ 𝜕𝑆,𝑦 𝑛𝑦=1

(1)

Transition from state tx to ty:

𝜌𝑥,𝑦 = 𝜕𝑥,𝑦

𝜕𝑥 (2)

Transition from state tx to the final state E:

𝜌𝑥,𝐸 = 𝜕𝑥,𝐸

𝜕𝑥 (3)

The second-order transition probability, which is the

probability of the transition (ty, tz) given that the previous

transition that occurred was (tx, ty), are estimated as

follows:

𝜌𝑥,𝑦 ,𝑧 = 𝜕𝑥,𝑦,𝑧

𝜕𝑥,𝑦 (4)

The conceptual prediction model is represented as a triple: Cpm

:=( N, Φ, M), where

N = {(tx, ∂x)}: Set of terms along with the

corresponding occurrence counts,

Φ = {(tx , ty , ∂x,y , ρx,y)}: set of transitions from tx to ty,

along with their transition weights (∂x,y), and first-order

transition probabilities (ρx,y),

M = {(tx , ty, tz, ∂x,y,z, ρx,y,z )}: Set of transitions from tx

, ty to tz, along with their transition weights (∂x,y,z ), and second-

order transition probabilities (ρx,y,z ). If M is non-empty, the

CPM is considered as the second order conceptual prediction

model, otherwise the first-order conceptual prediction model.

5.1 Schema of CPM

TermNavNet is automatically implemented in

OWL. The schema consists of classes cNode defines the

current state node and cOutLink defines the association

from the current state node to a next state node with a

transition probability Prob (e.g. ρx,y.) and relationship

properties referred as inLink, outLink and LinkTo.

ISBN: 978-81-930654-7-5

www.iirdem.org

Proceedings of ICEEM-2016

©IIRDEM 201620

Page 8: 3 iaetsd semantic web page recommender system

Fig. 3. Schema of conceptual prediction model.

5.2 Automatic Construction of TermNavNet

using CPM

We can construct TermNavNet by applying the

CPM schema with FVTP by using following algorithm.

We can obtain a 1st or 2ndorder TermNavNet by using

the 1st or 2nd-order CPM, respectively to update the

transition probability Prob based on first-order or second-

order probability formula.

TABLE 4: TermNavNet construction

Algorithm: Building TermNavNet

Input: F (FVTP) Output: M (TermNavNet) Process: Initialize M For each F= t1t2…tm ϵ F For each ti ϵ F Initialize cNode objects with NodeName = ti ,ti-1, ti+1

Occur =1 if they are not found in M Initialize a cOutLink object with Name =ti_ti+1 and Occur =1 if it is not found in M Increase ti.Occur and ti_ti+1.Occur if they found in M ti_ti+1.linkTo = ti+1

ti.outLink = ti_ti+1

ti.inLink =ti-1

Update all objects into M Update transition probabilities in the cOutLink objects Return M

5.3 Queries

RecTerm (tx, ty) is used to query the next viewed

terms for a given current viewed term curt and previous

viewed term prêt by applying second order transition

probability. If first-order transition probability is used

and we want to query the next viewed terms for a given

current viewed term curT using the query RecTerm (tx).

6. SEMANTIC-ENHANCED WEB-

PAGE RECOMMENDATION

STRATEGIES

Two Web-page recommendation strategies are

proposed depending on the order of CPM (i.e. for a given

current web-page or combination of current and previous

web-page, recommendations are made) as follows:

Recommendation strategy-1 uses TermNetWP and the first-

order CPM:

Step 1 builds TermNetWP;

Step 2 generates FWAP using LL-Mine;

Step 3 builds FVTP;

Step 4 builds a 1st-TermNavNet given FVTP;

Step 5 identifies a set of currently viewed terms

{tk} using query Querytopic (dk) on TermNetWP;

Step 6 infers next viewed terms {tk+1} given each

term in {tk} using query Recterm (tk) on the 1st-order

TermNavNet;

Step 7 recommends pages mapped to each term

in {tk+1} using query Querypage (tk+1) on TermNetWP.

Recommendation strategy-2 uses TermNetWP and the second-

order CPM:

Step 1 builds TermNetWP;

Step 2 generates FWAP using LL-Mine;

Step 3 builds FVTP;

Step 4 builds a 2nd-order TermNavNet given

FVTP.

Step 5 identifies a set of previously viewed terms

{tk-1}, and a set of currently viewed terms {tk} using query

Querytopic (d), d ∈ {dk-1, dk}, on TermNetWP;

Step 6 infers next viewed terms {tk+1} given each

pair {tk-1,tk} using query Recterm(tk-1, tk) on the 2nd-order

TermNavNet;

Step 7 recommends pages mapped to each term

in {tk+1} using query Querypage (tk+1) on TermNetWP.

Web-page recommendation rule, denoted as Rec, is

defined as a set of recommended Web-pages that are

generated by a Web-page recommendation strategy. A

Web-page recommendation rule can be categorised as

follows:

1) Recommendation rule is correct if next web page

accessed by the current user is present in the Rec.

2) Recommendation rule is satisfied if the User’s target

page will be accessed through any of the Web-page

present in Rec.

3) Recommendation rule is empty if next webpage

accessed by the user is not present in the Rec.

In [16], Zhou stated that the performance of Web-page

recommendation strategies is measured in terms of two

performance metrics: Precision and Satisfaction.

Let Rc is the sub-set of Rec, which consists of all correct

recommendation rules. The Web-page recommendation

precision is defined as:

Precision= |𝑅𝑐|

|𝑅𝑒𝑐| (5)

Let Rs be the sub-set of Rec, which consists of all satisfied

recommendation rules. The satisfaction for Web-page

recommendation is defined as:

Satisfaction = |𝑅𝑠|

|𝑅𝑒𝑐| (6)

ISBN: 978-81-930654-7-5

www.iirdem.org

Proceedings of ICEEM-2016

©IIRDEM 201621