3 iaetsd semantic web page recommender system
TRANSCRIPT
SEMANTIC WEB-PAGE RECOMMENDER SYSTEM P.Vinothini, T.vetriselvi
Department of cse,
K. Ramakrishna College of technology, Trichy.
E-Mail:[email protected]
Abstract -With the explosive growth of internet, large number of users are doing online search to satisfy their information need.
Web Usage Mining plays an important role in discovering knowledge representing the online user’s behaviour from the available
web log data. Satisfying online user’s need by the traditional web usage mining system is a challenging task as it solely constructed
by the web usage data of online users. Web-page recommendation is used to effectively capture intuition of online users. In order
to make Web-page recommendation system to accurately capture the intuition of the users, we proposed two novel knowledge
representation models to provide semantic enhancement to the web-page recommender system. The first model, namely semantic
network of a website, which represents domain knowledge by domain terms, Web-page and relations between them. Web Usage
model generates the frequent web access patterns by sequential pattern mining algorithms based on the usage data from the web
server. The second model, namely Conceptual Prediction Model (CPM), which integrates the semantic knowledge with the web
usage model resulting in weighted semantic network of semantic web usage knowledge. CPM constructs weighted semantic network
with the Frequently Viewed Terms as nodes, where weight represents the probability of transition between adjacent terms, using
Markov models.
Index terms: Web Usage mining, semantic knowledge, conceptual prediction model, semantic network, domain terms.
1. INTRODUCTION:
Web Mining is the major area in data mining
applications which discover patterns from the web data,
in order to better understand the needs of web-based
applications. Web mining can be divided into three
different types, which are web usage mining, web content
mining and web structure mining. Web Usage Mining
(WUM) is the process of discovering or extracting
patterns from the user’s access data in the web. Usage
data of the user is collected from one or more Web
servers. Web usage mining is very useful in
understanding the user’s interests and their network
behaviours. A typical application of WUM is represented
by the recommender system.
The main goal of a Web-page recommender
system is to effectively forecast the Web-page(s) that will
be visited next while user navigating through the website.
Web-Page recommendation is a system that captures
intuition of online users by their browsing patterns and
recommending those to users in the form of links to
stories, books, or interested pages. There are lot of
difficulties in developing an effective Web-page
recommender system, such as how to effectively learn the
user’s online behaviour and Web-page navigation
patterns from available historical usage data and, how to
discover these knowledge, and how to make online
recommendations system based on the discovered
knowledge.
In order to efficiently represent Web access
sequences (WAS) from the Web usage data, some studies
shown that approaches based on the tree structures and
probabilistic models are used [1]. These approaches are
using the historical web usage data and construct user
profile, which consist of links between Web-pages that
user are mostly interested, based only on the usage data.
By using this knowledge, when user comes online for the
next time, they predict next Web-page(s) that user most
likely to visit, given the current Web-page and previously
visited k- Web pages.
The performance of these approaches depends on
the sizes of training usage datasets. The bigger the
training dataset size is, the higher the prediction accuracy
is. The main drawback of these Web-page
recommendations are that they solely based on the Web
access sequences learnt from the Web usage data.
Therefore, if a user is visiting a new Web-page that is not in
the training usage data, then these approaches does not offer any
recommendations to this user. This problem is referred to as
“new-item problem”.
Some studies are showing that semantic-
enhanced approaches are used to overcome these new-
item problem [2],[3] by using domain ontology.
Integrating domain knowledge with Web usage
knowledge improves the prediction accuracy of the
recommender systems using ontology based Web mining
techniques [4]–[6].Web usage mining enriched with
semantic information showed higher performance than
classic Web usage mining algorithms [5]-[6]. However,
the main issue in these approaches are the problem facing
in representing and acquiring the semantic domain
knowledge. A lot of researches are going in this domain
ontology.
The domain ontology are mostly used to
represent the semantics of a website, which can be
constructed manually by experts or automatically by
learning models, such as the Bayesian network or a
collocation map, for many different applications. Given
the very large size of Web data in today’s websites,
building ontology manually for a website is challenging
task and they are time consuming and less reusable.
According to Stumme, Hotho and Berendt, it is
ISBN: 978-81-930654-7-5
www.iirdem.org
Proceedings of ICEEM-2016
©IIRDEM 201614
impossible to manually discover the meaning of all Web-
pages and their usage for a large scale website [10].
Automatic construction of ontologies saves time and
discovers all possible concepts within a website and links
between them, and they are reusable. However, the
drawback of this automatic approach is the need to
design and implement the learning models which can
only be done by professionals at the beginning.
This paper presents a novel method to provide
better Web-page recommendation by integrating Web
usage and domain knowledge. Two new knowledge
representation models and a set of Web-page
recommendation strategies are proposed in this paper.
The first model is a semantic network that represents
domain knowledge, which can be constructed
automatically. As it is fully automated, it can be easily
integrated with the Web-page recommendation process.
The second model is a conceptual prediction model,
which is a navigation network of domain terms based on
the frequently viewed Web-pages. This represents the
integrated Web usage and domain knowledge which
supports Web-page prediction and it can also be
constructed automatically. The proposed
recommendation strategies predict the next pages with
probabilities for a given Web user based on his or her
current Web-page navigation state through these two
models. This new method has automated the knowledge
base construction and alleviated the new-item problem.
This method yields better performance compared with
the existing Web usage based Web-page recommendation
systems.
This paper is structured as follows: Section 2
discusses about the related works; Section 3 briefs the
architecture diagram and the implementation of web
usage mining. Section 4 presents the first model, i.e. a
semantic network of domain terms. Section 5 presents the
second model, i.e. a conceptual prediction model (i.e.
integrating the semantic knowledge with the web-page
recommendation). For each of the models presented in
Sections 4-6, the corresponding queries that are used to
retrieve semantic information from the knowledge
models have been presented. Section 6 presents a set of
recommendation strategies based on the queries to make
semantic-enhanced Web-page recommendations.
2. LITERATURE SURVEY: Research work related to the web-page
recommender system that combines the web usage
mining with the semantic knowledge is very limited. So
they can be classified by the following two approaches:
2.1Traditional Usage Based Approaches Analog is the first Web Usage Mining systems. It
consists of two components: offline and online. In offline
phase, they construct the session clusters that exhibit
similar information from their usage data collected from
the web server. Then the online phase predicts which
cluster the current user may fall by their active user
sessions and suggest the list of pages which are related to
the current session. This approach has several drawbacks:
mainly scalability and accuracy. SUGGEST 1.0 [21] was
proposed as a two-tier system composed of off-line
module which analyse the Web server’s access log file,
and an online classification module which carried out the
second stage. Its main drawback was the asynchronous
cooperation between the two modules. In the next
version, SUGGEST 2.0, the two modules were merged to
perform the same operations but in a complete online
fashion. This results in the problem of estimating the
update frequency of the knowledge base. Potential
limitation of SUGGEST 2.0 might be: a) the memory
required to store Web server pages is quadratic in the
number of pages. b) it does not permit us to manage
Websites made up of pages dynamically generated.
Bamshad Mobasher et al. [19] presented WebPersonalizer,
a system that provides dynamic recommendations as a
list of hypertext links to users. The method is based on
anonymous usage data combined with the Web site
structure. F. Masseglia et al. [20] proposed an integrated
system, WebTool, which is based on sequential patterns
and association rules extraction to dynamically customize
the hypertext organization. The current user's behaviour
is compared with previously induced sequential patterns
and navigational hints are provided to the user. In
traditional web recommendation system [2], Sequential
mining is effectively used to discover the web access
patterns, particularly tree structures and markov models
are used. WAP-Tree is a tree structure used for holding
access sequences in a very compact form to enable access
pattern mining. In [7], they proposed the PLWAP-Mine,
which use the PLWAP tree structure to incrementally
update web sequential access patterns efficiently without
scanning the whole database even when previous small
items become frequent. The position code features of the
PLWAP tree are used to efficiently mine these trees to
extract current frequent patterns when the database is
updated. FOL-Mine is an efficient sequential pattern
mining algorithm proposed in [8]. It is based on the
concept of WAP-tree but uses a special linked structure to
hold access sequences for processing and proved to be
efficient. FOL-mine is proved better than all existing
WAP-tree mining methods. FOL-list is used to hold the
first occurrence information of items during the mining
of patterns in the intermediate projected databases. This
manages the suffix building very efficiently. The node
structure suggested in [14] is modified to process the
weighted support of sequences. Based on the study [9],
weighted sequential pattern mining is better than all non-
weighted sequential pattern mining (eg: FOL-Mine) by
ISBN: 978-81-930654-7-5
www.iirdem.org
Proceedings of ICEEM-2016
©IIRDEM 201615
giving weights to the item in Web Access Sequence
Database (WASD).They use the modified form of the
structure used in [8] and are enhanced by holding weight
information of the item. This method needs only one
database scan to generate weighted list structure.
2.2 semantic-Enhanced Approaches A lot of research reported that Web-Page
recommendation can made more accurate by integrating
the web usage knowledge with the domain knowledge.
In [11]-[12], domain ontology of the websites is used to
improve the recommendation process. In [11], Liang Wei
and Song Lei used ontology, which includes concepts and
significant terms extracted from documents, to represent
a website’s domain knowledge. They generate online
recommendations by semantically matching and
searching for frequent pages discovered from the Web
usage mining process. This approach showed higher
precision rates, coverage rates and matching rates. In
[6],[13] ontology reasoning are used, where Web access
sequences are converted into sequences of ontology
instances, to make recommendation. In these studies, the
Web usage mining algorithms find the frequent
navigation paths in terms of ontology instances rather
than normal Web-page sequences. In [14], they proposed
SWUM (Semantically enriched Web Usage Mining)
method which incorporate the semantic data and site
structure with the solely usage data based WebPUM
method. WebPUM represents usage data by means of
adjacency matrix and induces the navigation patterns
using a graph partitioning technique, which is then
enriched with the semantic data of the website. The
semantic metadata extracted takes into account both the
semantics in a page contents and the semantic
relationship in the Web pages. The semantic similarity is
represented in terms of a semantic similarity matrix that
gives the similarity score between every pair of Web
pages. Thus, the semantic similarity matrix is combined
with the adjacency matrix in order to derive the
semantically enriched weight matrix, and the resulting
navigation patterns are fed into recommendation engine.
The drawback is that the system is suitable only for
statically generated web-pages of the website. In [15],
frequent sequential patterns are enriched with semantic
information, which are expressed in terms of ontology
instances instead of web page sequences, are used for
recommending subsequent pages to the user. The
discovered Semantic rich sequential association rules
form the core knowledge of the recommendation engine
of the proposed model. The vision of a Semantic Web has
recently drawn attention both from academic and
industrial circles. The incorporation of semantic Web for
generating personalized Web experience is to improve
the results of Web mining by exploiting the new semantic
structures [2]. As a consequence, there is an increasing
effort in defining Web pages and objects in terms of
semantic information by using ontology. In [2], the first
part covers how the content and the structure of the site
can be leveraged to transform raw usage data into
semantically-enhanced transactions which is then used
for semantic Web usage mining and personalization. The
second part presents a framework for more
systematically integrating full-fledged domain ontologies
in the personalization process. In [12], the proposed
system is domain-independent, is implemented as a Web
service, and uses both explicit and implicit feedback-
collection methods to obtain information on user’s
interests. Domain-based method makes inferences about
user’s interests and a taxonomy-based similarity method
is used to refine the item-user matching algorithm,
improving recommendation prediction.
3 ARCHITECTURE OF WEB-PAGE
RECOMMENDER SYSTEM: The implementation of the
recommendation system is taken place in two
components: offline and online. Offline component
builds the knowledge base by analysing the historical
data, such as server access log file or web logs which are
captured from the server, then these web logs are used in
the online component for capturing intuition list of the
user so as to recommend page views to the user whenever
user comes online for the next time. Data collection, data
pre-processing, pattern discovery and pattern analysis
are the steps to be taken in web usage mining in offline
phase.
3.1 Data Collection: Data collection is the first step in web usage
mining. Web usage data are collected from the three main
sources: Web servers, proxy servers and client-side
requests. In [17],Cooley and Mobasher reported that large
information reside only in server log files and it is difficult
to get the data from proxy servers and from client side
browsing, So we use the server log files as a primary data
sources. There are several types of log files. IIS web log
consists of 17 attribute, each represents data in records.
The fragment of IIS web log:
3.2 Data Pre-processing:
Generally, data cleaning, identification of user,
session and path completion are various steps involved
in pre-processing.
3.2.1. Data Cleaning:
The data cleaning task removes the log entries
which are irrelevant and redundant. There are two kinds
of irrelevant data need to be removed:
ISBN: 978-81-930654-7-5
www.iirdem.org
Proceedings of ICEEM-2016
©IIRDEM 201616
i. Files having suffixes such as .jpeg, .gif, .css, .cgi,
etc., which can be found in cs_uri_stem field of
IIS log.
ii. Error request, which can be found in sc_status
field.
Once pre-processing done, data from multiple sources are
transformed into an acceptable form, which serves as an
input to various mining processes.
3.2.2. User Identification: The user identification process is to distinguish
the different users from the web access log file. Referrer-
based method is used for this process. It is complex task
due to the presence of resident caches and proxy servers.
We have the following heuristics [18] used to identify the
user:
1) Each IP address represents one user;
2) If the IP address is same for more logs, but the
agent field shows changes in browser or OS , then
IP address represents a different user;
3) If all the above fields are same, then referrer
information can be considered. If a user
requested page is not directly accessible by a link
from any of these pages, hence with the same IP
there is another user.
3.2.3. Session Identification The aim of the user session identification is
to find out the different user sessions from the web access
log file. The user session identification involves - dividing
the page accesses of every user into separate sessions. We
have the methods to identify user session based on
timeout mechanism and maximal forward reference. In
[18], following rules are used to identify user session:
1) If there is a new user, and hence, there is a new
session;
2) If the referrer page is null in one user session,
there is a new session;
3) If the time frame between page requests
exceeds a limit, then user is starting a new session.
3.2.4. Path Completion
Due to the presence of proxy server and local
cache, some user accesses will not be recorded in the
access log. The path completion is used to acquire
complete user access path by filling up the missing page
references. The incomplete access path is recognized by
checking the link for the user requested page and last
page. If it is unlinked and that page is already in the user’s
history, then it is clear that back button is used by the
user. By these methods, complete path is acquired. Web
log pre-processing helps in removing unwanted data
from the log file and reduces the original file size by 50-
55%.
Figure 1: Architecture Of Web Usage Mining
integrated With Semantic Knowledge
3.3 Pattern Discovery:
Once user transactions have been identified, the
web logs are converted into relational databases and
then sequential pattern mining are performed on data for
discovering Frequent Web Access Patterns (FWAP).
In this paper, we used LL-Mine algorithm, which
is a modified form of the structure in [9] for Sequential
pattern mining as it is efficient compared to all other
existing algorithm, which produces frequent web access
sequences in Linked List data structure. This scans the
database and produces frequent item sets which satisfy
the weighted support. Usually, only the order of Web-
page is taken into consideration in Sequential pattern
mining. In order to give the importance to the Web-page,
time visited by the user and the frequency of visit both
are taken into account to assign the weight to the Web-
page while generating web patterns using W_ASSIGN
algorithm. The weight support of the access sequence, s is
given by [9]:
Weight_support(s) = g_support(s) x weight(s)
Where,
Weight(s) is calculated from the average weight of the
items in the sequence, and
g_support(s) is the support of the sequence in the WASD.
Frequent patterns are generated by this algorithm and are
used to integrate with the semantic knowledge by
crawling all the URL of these FWAP to collect domain
term sequences.
ISBN: 978-81-930654-7-5
www.iirdem.org
Proceedings of ICEEM-2016
©IIRDEM 201617
TABLE 1: Algorithm W_ASSIGN
ALGORITHM: W_ASSIGN
Input: An access sequence database, WASD A support threshold
Output: Set of weighted access patterns
Method: 1. For each web access sequence s=p1,p2,….,pn
Set weight (pi) =0; Let length =0; Create linked list C, where node containing item name and their weight;
Set weight to 0; For each occurrence of item pi ,
Increment freq (pi) and add Time (pi); Update the values in C; End for; Update the list of items in LIN with the C
For each pi, Update Take harmonic mean of freq(pi) and Time(pi); Assign it to weight (pi); {End for} 2. For each item pi in LIN, check whether it passes the Support threshold, add the item into frequent pattern 3. Call LL-Mine 4. Return
TABLE2: Algorithm for LL-Mine
Algorithm: LL-Mine Parameters:
Current frequent pattern, p List of fist occurrence, L Absolute support, η
Method: 1. for each weighted frequent item, pi
i. generate the first occurrences list, L1, Initialize L1 with Weight_support=0; Locate the first occurrences of the element p in projected databases D-p using L; Generate L1 with node holding seq-id and pos; Add the weight of the item at each occurrence; Update the header of the list L1 with Weight_support (pi); ii. If the Weight_Support (pi) > η
Add p.pi to F, set of pattern Add p.pi to stack for suffix building. p= p.pi Call LL-Mine (p, L1, η) {End if}
iii Delete the current L. {End for} 2. Return
3.4 semantic network construction:
This section presents the first model, i.e.
Semantic network of a website and their schema and
explains the queries to infer the terms and webpages.
Semantic network is a kind of knowledge map which
represents concepts as domain terms and Web-pages, and
relations between the concepts. To construct the semantic
network, domain terms are collected from the Web-page
titles and then we extract the relations between these
terms by these two aspects: (i) the collocations of terms-
determined by the co-occurrence relations of terms in
Web-page titles; and (ii) the associations between terms
and webpages.
In order to know how these terms are
semantically related, the domain terms and co-occurrence
relations are weighted. Based on these relations, we can
guess how closely the Web-page is associated with each
other semantically. To infer the semantics of Web-pages,
we can query about the relations including relevant pages
and key terms for a given page, and the pages for given
terms, thereby achieving semantic enhanced Web-page
recommendations. This semantic network is considered
to be TermNetWP.
The following are the procedures to automatically
construct TermNetWP:
1) Collect the titles of visited Web pages.
2) Extract term sequences from the Web-page
titles.
3) Build the semantic network – TermNetWP.
4) Implement an automatic construction of
TermNetWP.
To reuse and share the domain term network by
Web-page recommender system, TermNetWP is
implemented in OWL. The input to this network is a term
sequence collection (TSC), in which each record consists
of:
1) The PageID of a Web-page d ∈D;
2) A sequence of terms X = t1 t2 . . . tm ∈ TS, m >0, extracted
from the title of the Web-page;
3) The URL of the Web-page.
3.5 Frequently Viewed Term Pattern (FVTP): In this paper, we used Web usage mining
technique, namely LL-Mine, to obtain the frequent Web
access patterns (FWAP).We integrate FWAP with
TermNetWP in order to result in a set of frequently
viewed term patterns (FVTP) which is the semantic Web
usage knowledge of a website.
The frequent web access pattern is described as follows:
P = {P1, P2 . . . Pn}: Set of FWAP
Where Pi = di1 di2 . . . dim: pattern showing sequence of Webpages,
n is the number of the patterns,
m is the number of Web-pages in the pattern.
The Frequently viewed term patterns is denoted as
follows:
F = {ti1 ti2 . . . tim }: Set of FVTP,
where each domain term pattern f is a sequence of domain
terms, in which each domain term tik is a domain term of page
dik in Pi.
ISBN: 978-81-930654-7-5
www.iirdem.org
Proceedings of ICEEM-2016
©IIRDEM 201618
3.6 Conceptual Prediction Model (CPM)
Conceptual prediction model (CPM) is used to
automatically generate a weighted semantic network of
frequently viewed terms with the weight being the
probability of the transition between two adjacent terms
based on FVTP in order to obtain the semantic Web usage
knowledge that is efficient for semantic-enhanced Web-
page recommendation. This semantic network is referred
to as TermNavNet. We present two Web-pages recommendation
strategies, based on the semantic knowledge base of a
given website, through the semantic network of Web-
pages (TermNetWP) and the weighted semantic network
of frequently viewed terms of Web-pages within the
given website (TermNavNet). These recommendations
are named as semantic enhanced Web-page
recommendations.
4 TermNetWP ALGORITHM:
4.1 Definitions of TermNetWP
The notations used in TermNetWP are
summarized as follows:
TERMauto = {ti: 1 ≤ i ≤ p}: set of domain terms extracted
from Web-page titles;
D = {dj: 1 ≤ j ≤ q}: set of the Web-pages;
Xj = t1 t2 t3. . .tn tk : sequence of domain terms, which
may be duplicated, present in each page dj,
ti ẽ dj: Denotes ti is a domain term of dj.
tf (t, D): term frequency of t over D;
TS = {Xj: 1 ≤ j ≤ q}: set of domain term sequences, and
a pair of terms (ti, tj), ti, tj ∈ TERMauto;
ω (ti, tj): Number of times that ti is followed by tj in
TS, and there is no term between them.
The semantic network of Web-pages, namely
TermNetWP, is defined as a 4-tuples:
Netauto: =<T, A, D, R >, where
T = {(term, term frequency)}: Set of domain terms and
corresponding occurrences,
A= {(tx, ty, wxy): wxy= ω(tx, ty) >0}: Set of associations
between tx and ty with weight wxy,
R = {(t, d): t ẽ d}: domain term t is related to web-
page d by its presence in title page.
4.2 Schema of TermNetWP: In schema of TermNetWP, class Instance
represents domain term, i.e. t ∈TERMauto, which has two data
type Name, and iOccur, and one WPage object property.
The iOccur property refers to the count of occurrences of
term among the set of Web-page titles. Class WPage
represents Web-page, i.e. d ∈D, with properties Title,
PageID, URL and Keywords in the title. The Keywords
property defines the terms in a Web-page title. These two
classes are related through the ‘hasWPage’ relationship,
i.e.(t,d)∈R, from Instance to WPage, shows that a term
instance has one or more Web-pages; and ‘belongto-
Instance’ relationship, which is the inverse relationship of
‘hasWPage’, shows that a Web-page belongs to one or
more term instances. An association class OutLink is
defined to specify the in-out relationship between two
terms. Class OutLink is used for connecting from one term
instance (tx) to another term instance (ty), and defines the
corresponding connection weight (iWeight = wxy).
Figure 2: schema of TermNetWP
Class OutLink involves two object properties: (i) ‘from-
Instance’ defines one previous term instance, and (ii) ‘to-
Instance’ defines one next term instance. Class Instance
also has two object properties: (i) ‘hasOutLink’, which is
the inverse of ‘from-Instance’ relation, and (ii)
‘fromOutLink’, which is the inverse of ‘to-Instance’
relation.
4.3 Queries
Based on TermNetWP, we can query: (i) domain
terms for a given Web-page, and (ii) Web-pages mapped
to a given domain term.
4.3.1 Query about terms of a given Web-page:
Querytopic (d) = (t1, t2 . . . ts), where d ∈D; (ti, d) ∈R,
i = [1 . . . s]; tf (ti, D) >tf (tj, D), (i <j & 1 ≤ i, j ≤ s).
Using this query Querytopic (d), given Web-page d ∈D,
term instances that are associated with the WPage
instance dare retrieved via the ‘belongto-Instance’ object
property. Degree of occurrences of term in the domain is
taken into account and is returned in descending order.
The Connection weight between a page and a domain
term is defined as:
η(dj, t) = ∑ 𝜔(𝑡𝑘, 𝑡) + 𝜔(𝑡, 𝑡𝑘)𝑛
𝑘=0
Where n = | {tk: tk ẽ d}|: the number of domain terms in
the title of page d.
4.3.2 Query about pages mapped to a given term:
Querypage (t) = (d1, d2 . . . ds), where (t, di) ∈R,
i = [1 . . . s]; and η (di, t) < η(dj, t), (i <j&1 ≤ i, j ≤ s).
Using this query Querypage (t), given domain term t
∈TERMauto, WPage instances (i.e. web-pages) that are
ISBN: 978-81-930654-7-5
www.iirdem.org
Proceedings of ICEEM-2016
©IIRDEM 201619
mapped to the term instance t are retrieved via
‘hasWPage’ object property. The returned pages are
sorted in ascending order of connection weights between
the Web-pages and domain term t to show the degree of
relevance to the term t.
TABLE 3:Algorithm forTermNetWP
Input: TSC(Term Sequence Collection) Output:G(TermNetWP)
Process: Let TSC = {PageID,X= t1t2 . . . tm , URL } Initialize G;Let R= root or the start node of G Let E= the end node of G For each PageID and each sequence X in TSC{ Initialize a WPage object identified as PageID
For each term ti ϵ X { If node ti is not found in G, then Initialize an Instance object I as a node of G Set I.Name =ti
Else Set I= the Instance object named ti in G Increase I.iOccur by 1 If (i==0) then Initialize an OutLink R-ti if not found Increase R-ti.iWeightby 1 Set R-ti fromInstance=R Set R-ti toInstance =I If (i>0 & i<m) then Get PreI =the Instance object with name ti-1
Initialize an OutLink ti-1-ti if not found Increase ti-1-ti.iWeight by 1 Set ti-1-ti.toInstance = I Set ti-1-ti.fromInstance = preI If (i==m) then Initialize an OutLink ti-E if not found Increase ti-E.iWeight by 1 Set ti –E.toInstance =E Set ti –E.fromInstance = I Set I.hasWPage = PageID Add term ti into PageID.Keywords
} }
5. TermNavNet ALGORITHM: In Section 4, we presented TermNetWP, which
represents the semantics of Web-pages within a website
efficiently but they are not sufficient for making effective
Web-page recommendations on their own. To overcome
this issue, we should integrate the TermNetWP with Web
usage knowledge to obtain the semantic Web usage
knowledge.
The notations used to represent the TermNavNet are
summarized as follows:
∂x: Number of occurrences of tx in F;
∂x, y: Number of times that tx followed by ty in F and there is no
term between them;
∂S,x :Number of times domain term tx is the first item in a
domain term pattern f ;
∂x,E: Number of times a domain term pattern f terminates at
domain term tx ;
∂x,y,z: Number of times that (tx, ty) followed by tz in F and there
is no term between them.
The probability of a transition is estimated by the ratio of
the number of times the corresponding sequence of states
(i.e. visited Web-page) was traversed and the number of
times the anchor state occurred. In our system, we take
into account first-order and second-order transition
probabilities.
Given a CPM having states {S, t1 . . . tp , E}, and N is the
number of term patterns in F, the first-order transition
probabilities are estimated according to the following
expressions:
Transition from the starting state S to state tx:
𝜌𝑆,𝑥 =𝜕𝑆,𝑥
∑ 𝜕𝑆,𝑦 𝑛𝑦=1
(1)
Transition from state tx to ty:
𝜌𝑥,𝑦 = 𝜕𝑥,𝑦
𝜕𝑥 (2)
Transition from state tx to the final state E:
𝜌𝑥,𝐸 = 𝜕𝑥,𝐸
𝜕𝑥 (3)
The second-order transition probability, which is the
probability of the transition (ty, tz) given that the previous
transition that occurred was (tx, ty), are estimated as
follows:
𝜌𝑥,𝑦 ,𝑧 = 𝜕𝑥,𝑦,𝑧
𝜕𝑥,𝑦 (4)
The conceptual prediction model is represented as a triple: Cpm
:=( N, Φ, M), where
N = {(tx, ∂x)}: Set of terms along with the
corresponding occurrence counts,
Φ = {(tx , ty , ∂x,y , ρx,y)}: set of transitions from tx to ty,
along with their transition weights (∂x,y), and first-order
transition probabilities (ρx,y),
M = {(tx , ty, tz, ∂x,y,z, ρx,y,z )}: Set of transitions from tx
, ty to tz, along with their transition weights (∂x,y,z ), and second-
order transition probabilities (ρx,y,z ). If M is non-empty, the
CPM is considered as the second order conceptual prediction
model, otherwise the first-order conceptual prediction model.
5.1 Schema of CPM
TermNavNet is automatically implemented in
OWL. The schema consists of classes cNode defines the
current state node and cOutLink defines the association
from the current state node to a next state node with a
transition probability Prob (e.g. ρx,y.) and relationship
properties referred as inLink, outLink and LinkTo.
ISBN: 978-81-930654-7-5
www.iirdem.org
Proceedings of ICEEM-2016
©IIRDEM 201620
Fig. 3. Schema of conceptual prediction model.
5.2 Automatic Construction of TermNavNet
using CPM
We can construct TermNavNet by applying the
CPM schema with FVTP by using following algorithm.
We can obtain a 1st or 2ndorder TermNavNet by using
the 1st or 2nd-order CPM, respectively to update the
transition probability Prob based on first-order or second-
order probability formula.
TABLE 4: TermNavNet construction
Algorithm: Building TermNavNet
Input: F (FVTP) Output: M (TermNavNet) Process: Initialize M For each F= t1t2…tm ϵ F For each ti ϵ F Initialize cNode objects with NodeName = ti ,ti-1, ti+1
Occur =1 if they are not found in M Initialize a cOutLink object with Name =ti_ti+1 and Occur =1 if it is not found in M Increase ti.Occur and ti_ti+1.Occur if they found in M ti_ti+1.linkTo = ti+1
ti.outLink = ti_ti+1
ti.inLink =ti-1
Update all objects into M Update transition probabilities in the cOutLink objects Return M
5.3 Queries
RecTerm (tx, ty) is used to query the next viewed
terms for a given current viewed term curt and previous
viewed term prêt by applying second order transition
probability. If first-order transition probability is used
and we want to query the next viewed terms for a given
current viewed term curT using the query RecTerm (tx).
6. SEMANTIC-ENHANCED WEB-
PAGE RECOMMENDATION
STRATEGIES
Two Web-page recommendation strategies are
proposed depending on the order of CPM (i.e. for a given
current web-page or combination of current and previous
web-page, recommendations are made) as follows:
Recommendation strategy-1 uses TermNetWP and the first-
order CPM:
Step 1 builds TermNetWP;
Step 2 generates FWAP using LL-Mine;
Step 3 builds FVTP;
Step 4 builds a 1st-TermNavNet given FVTP;
Step 5 identifies a set of currently viewed terms
{tk} using query Querytopic (dk) on TermNetWP;
Step 6 infers next viewed terms {tk+1} given each
term in {tk} using query Recterm (tk) on the 1st-order
TermNavNet;
Step 7 recommends pages mapped to each term
in {tk+1} using query Querypage (tk+1) on TermNetWP.
Recommendation strategy-2 uses TermNetWP and the second-
order CPM:
Step 1 builds TermNetWP;
Step 2 generates FWAP using LL-Mine;
Step 3 builds FVTP;
Step 4 builds a 2nd-order TermNavNet given
FVTP.
Step 5 identifies a set of previously viewed terms
{tk-1}, and a set of currently viewed terms {tk} using query
Querytopic (d), d ∈ {dk-1, dk}, on TermNetWP;
Step 6 infers next viewed terms {tk+1} given each
pair {tk-1,tk} using query Recterm(tk-1, tk) on the 2nd-order
TermNavNet;
Step 7 recommends pages mapped to each term
in {tk+1} using query Querypage (tk+1) on TermNetWP.
Web-page recommendation rule, denoted as Rec, is
defined as a set of recommended Web-pages that are
generated by a Web-page recommendation strategy. A
Web-page recommendation rule can be categorised as
follows:
1) Recommendation rule is correct if next web page
accessed by the current user is present in the Rec.
2) Recommendation rule is satisfied if the User’s target
page will be accessed through any of the Web-page
present in Rec.
3) Recommendation rule is empty if next webpage
accessed by the user is not present in the Rec.
In [16], Zhou stated that the performance of Web-page
recommendation strategies is measured in terms of two
performance metrics: Precision and Satisfaction.
Let Rc is the sub-set of Rec, which consists of all correct
recommendation rules. The Web-page recommendation
precision is defined as:
Precision= |𝑅𝑐|
|𝑅𝑒𝑐| (5)
Let Rs be the sub-set of Rec, which consists of all satisfied
recommendation rules. The satisfaction for Web-page
recommendation is defined as:
Satisfaction = |𝑅𝑠|
|𝑅𝑒𝑐| (6)
ISBN: 978-81-930654-7-5
www.iirdem.org
Proceedings of ICEEM-2016
©IIRDEM 201621