an automated approach to extract domain ontology...
TRANSCRIPT
34
Chapter 4
An Automated Approach to Extract Domain Ontology for
E-Learning System
E-learning is becoming a hot area in the field of both online and offline
education. E-learning deals with the interaction between the teacher and learner on the
basis of knowledge possessed by the learner. Aware about the learner’s knowledge
level, the teacher can easily provide the required lessons to the student through the
online medium such as Internet. Adaptive learning is such an educational method
which uses computers as an interactive method. It also tailors the learning materials
based on the learner’s knowledge level. In this chapter an automated approach to
extract domain ontology is designed with the objective of the enhancement of the
efficiency of adaptive e-learning.
4.1 ONTOLOGY AND E-LEARNING
Ontologies have become a key concept for providing more relevant lessons to
the learner than other means. Ontologies are established for information sharing and
are extensively used as a means for conceptually structuring domains of interest.
Ontologies help us to describe, develop, annotate and relate the educational resources,
which in turn will help in the retrieval of more relevant resources for the learners.
Ontology can be created by a domain expert and embedded into an e-learning system
or it can be automated and embedded in to e-learning system. Automation of
ontologies will reduce the human intervention and also the time required for ontology
creation. The chief advantage of the proposed approach is automated ontology
construction through concept map extraction. It is effectively achieved through the
use of association rule mining and sequential pattern mining algorithms. The
constructed domain ontology is applied to the e-learning system so that the real-time
application of the proposed approach is discussed.
35
Figure 4.1: Sample ontology for E-learning
Database
Active DB
Cloud DB
Data Structures
DB languages
DB systems
DBMS
DBA
DB developers
DB design
DB modelling
OO model
DB model
Data warehouse
DB machines
Hierarchical
Relational
Network
Entity-relational
QL
DML
DDL
Is a
Is a
Is a
Part of
Union of Union of
Has
Has
Part of
Part of
Is a
Is a
Has
Has
Has
Has
Is a
Is a
Is a
Part of
Union of
Part of
Has
Part of
Union of
Part of
Part of
Part of
36
Figure 4.1 shows a sample structure of an ontology constructed by domain
experts for the e-learning system. Though the structure is a basic graph like structure,
we incorporate relations with each node present in the ontology. A node is a topic
related to the domain that is considered for the construction of e-learning system.
4.2 ONTOLOGY CONSTRUCTION
The main objective of the proposed approach is to construct an ontology for an e-
learning system which fulfills the needs of clients. The client mentioned in the
approach is related to the student or person who makes use of the e-learning system.
The ontology is listed with a detailed association between the nodes or the topics. The
ontology construction undergoes a series of developing steps to ensure that the e-
learning system is an effective one. The ontology is constructed from a text corpus,
which contain a number of documents regarding a particular domain. So, the ontology
has to be created based on the above specified domain. The main steps in the
construction of an ontology are:
Processing the documents
Outline the domain ontology
Concept Processing (Extraction of concepts from the domain)
Creating concept maps
The above four steps serve as the main components of the proposed approach.
These processes have the virtue of producing an effective ontology for the learning
system. Based on these steps, an automatic ontology construction method is provided.
The proposed approach derives a specific algorithm to give weightage to all the nodes
and to provide association between the nodes. The nodes are assigned their inter-
relationships through a mutual association function. The different document
processing methods will help to extract the key features from each document. The key
features are then associated together to form the concepts and from the concepts, an
effective concept map is created for the e-learning system. Thus, a query from a user
is used to extract a concept map regarding that query.
37
Figure 4.2: Ontology construction
Figure 4.2 depicts the block diagram for the construction of an ontology for
the specified e-learning system in our proposed approach. In the succeeding sections,
the proposed approach in discussed in detail.
4.2.1 DOCUMENT PROCESSING
The initial part of the ontology construction is to process the documents to
extract the keywords from the documents. The text corpus is selected and the
documents from the corpus are extracted for the processing which is done by applying
two basic document processing steps. Initially a stop word removal process removes
all the non-profitable words from the documents. Once the stop word removal is
finished, a stemming algorithm (Willet, 2006) is applied to extract the keywords in
their root form. The keywords from the documents are then stored in an array by
making sure that no words are repeated words. The stored keywords are then
transferred to the concept extraction phase.
For example: Consider two statements from the text corpus
“Database is a collection of related information. Data in a database are stored in
the form of tables.”
Text corpus
Process
documents
Outline domain
ontology Concept extraction
Concept map
38
The stop words are: is, a, of, from, are, in, the.
Keywords extracted: Database, Collection, Related, Information, data, database,
stored, tables.
Stemming: Collection - Collect
Related –Relate
Tables - Table
4.2.2 OUTLINING DOMAIN ONTOLOGY
The procedure of the ontology construction should be specific and transparent
as we define the e-learning system as a user friendly one. In this section, the different
steps that are needed for the efficient construction of the ontology are defined. The
basic structure of the domain ontology can be presented as in Figure 4.3.
Figure 4.3: Outline of domain ontology construction
Concept extraction
Redundancy check
Dimensionality reduction
Deriving associations
Creating concept map
Ontology construction
39
The outlining of the structure of the ontology should be precise, because
ontology is a domain specific one. The main concentration is needed in the concept
extraction phase. The concept should be associated to more concepts and it should
possess an individual existence. So, the redundancy in the concept should be
identified to ease the process of execution. The other major part is regarding the
dimension of the concept set. For high dimensional concept set, the dimension should
be reduced to make the associations more rigid and precise.
4.2.3 CONCEPT PROCESSING
A concept is defined as a keyword or set of keywords that defines a common
topic as reference. So, the purpose of concept processing step is to identify such
concepts from the set of keywords, which is already extracted. Let K be the set of n
keywords defined by,
The set K includes the keywords from all the documents. Now we process
each keyword to find the concept. Each keyword is selected and processed with other
keywords to find the association between them. Initially, a sorting process is applied
to the set of keywords based on their frequency. The most frequent keywords are
selected as top priority keywords. These top priority keywords are processed initially
for concept extraction. The frequency of each keyword k is calculated based on their
presence in the document present in the text corpus.
∑
The frequency is calculated as the number of keywords ( present in the
document ( to the total number of keywords (N(k)) in . Now the set K is
reformatted with the most frequent keywords in the descending order of their
frequency values. We adopt a sentence level windowing process, in which the
window moves in a sliding manner. The text window formed is four term window
40
which enclosed in a sentence. As the window slides, the words enclosed in the
window are selected for association calculation. The association is calculated as,
( ) |
The association between two keywords is obtained through the probability of
occurrence of the keywords. A conditional probability is adopted for finding the
relation between the keywords. The value of the association between the keywords is
used to extract the concept. If the association value is high, it is considered as a
concept. The process is continued upto the last document in the text corpora. A
threshold value is set for making the distinction between the keywords and concepts.
If the association value is higher than the threshold, then the corresponding keywords
constitute a concept. Similarly, all the association values are analysed and a concept
set is formed which is defined as,
The set C represents the concepts which are selected after the association
value analysis. A concept is defined as a group of two or more associated keywords.
The keywords selected are based on the frequency of their occurrence in the
considered domain. Thus, according to association of a keyword with another, a set of
concepts is formed. The concepts are used then as the building blocks of domain
specific ontology. Though the concepts are generated from the most frequent
keywords, the concept set C may contain redundant concepts. So in order to make the
concept set more specific, a redundancy analysis and dimensionality reduction
process are carried out.
Step 1. Select text corpus
Step 2. Apply stop word removal algorithm
Step 3. Apply stemming algorithm
Step 4. Store keywords in set K,
Step 5. Find frequency of every keyword
Step 6. Sort keywords based on frequency
41
∑
Step 7. Find joint probability between keywords
| )
Step8. Calculate association values between concepts
( ) |
Step 9. List association values
Step10. Generate concept set C,
Step11. Stop.
Figure 4.4: Concept Extraction algorithm
4.2.3.1 REDUNDANCY ANALYSIS
The extraction of concepts produces a number of dominant and unwanted
concepts and these make the dataset redundant. So, in order to reduce the redundancy,
a redundancy analysis has been carried out on the extracted concepts to ensure that the
concepts which are selected are not redundant. We use the information gain and
entropy technique (Leow et al. 2008) that are used to detect how redundant a term is
in set of documents. Let us consider the concept set for the redundancy check,
Each concept in the concept set C is associated with one or more keywords, so
the redundancy analysis concentrates on those words. Suppose 1c is the concept that
is present in the set C, and possesses 3 terms. The redundancy analysis checks the
presence of each keyword in the text corpus through the information gain and entropy
method. Table 4.1 shows the probabilities of the different terms in the concept c1.
42
Table 4.1: Probability of various terms in Concept c1
P(k1)=2/3 P(k2)=1/3 P(k3)=1/3
The probability is calculated in order to find the occurrence of each term in the
concept c1. The number of bits needed to encode a keyword should be calculated in
order to find the entropy function. The number of bits which are needed to encode k1
with probability 2/3 can be calculated using the following formula,
where, is the number of bits needed to encode keyword . The entropy value
for the concept c1 becomes,
∑
Here, the function is the entropy value of the concept. So from this, we have
applied a threshold for pruning the terms which have a bit value below the applied
threshold. Similarly, we prune the redundant concepts from the domain on the basis of
their entropy values. Generally, the entropy function is defined as,
∑
The concepts with high entropy values are sustained in the domain and those
with low entropy values are pruned. By applying this method, we can avoid the
unwanted concepts from the domain and the information gain entropy method is a
highly reliable method.
Step 1. Select Concept set
Step 2. Encode all the keywords using bit encoding
, where is the probability
Step 3. Find entropy value for all the concepts
43
∑
Step 4. Filter the concepts based on the entropy values
Step 5. Store the filtered concepts in concept set C
Step 6. End
Figure 4.5: Redundancy analysis algorithm
4.2.3.2 REDUCING THE DIMENSION OF THE CONCEPT SET
A major difficulty in the ontology construction process is because of the high
dimension of the concept set. So, a dimensionality reduction method called Principal
component analysis (PCA) (Jonathon Shlens, 2005) is used for reducing the
dimension of the concept set. The concept set is considered as a M * N matrix for
applying PCA algorithm for the reducing the dimension. A main feature to find is the
covariance matrix from the empirical mean values for efficient dimensional
reduction. The empirical mean values can be listed as,
∑
where, D is the matrix of size . Then, the derivations of the means are
calculated and stored in a matrix D. The ‘d’ values are then used for the calculation of
covariance matrix
where h is an n x 1 column vector and
The eigenvector of the covariance matrix is calculated and stored in a separate
matrix E. After finding the eigenvalues for the matrix we calculate diagonal
matrix of .
44
From the above calculated data, a matrix with eigenvalues is constructed for reducing
the dimension of the data under consideration
where, W is a LM matrix where L is the columns of matrix Cov . The next step of
the dimensionality reduction process is to calculate the empirical standard deviation,
which contains the square root of each element along the main diagonal of the
covariance matrix covC and then, this value is supplied to calculate the z-score matrix.
√
Thus, we get the reduced dimension matrix R , which is the dot product of the z-score
matrix ( ) and conjugate of the matrix , which has dimension of . This
matrix reduces the concept space of our concept map and hence increases the speed of
our search. The dimensionally reduced matrix contains the most relevant concepts
regarding the domain under consideration. Thus, after the concept processing step, the
proposed approach provides a set of data which will contain the most prime concepts
regarding the domain. The multilevel concept processing steps are used for
constructing an effective ontology for the e-learning system.
Step 1. Select concept set
Step 2. Define M*N matrix
Step 3. Calculate empirical mean
∑
Step 4. Find the covariance matrix
Step 5. Evaluate diagonal matrix for
45
Step 6. Find the z- score
Step 7. Derive the dimensionally reduced matrix R
Step 8. The elements in matrix R constitute the concept set C.
Step 9. End
Figure 4.6: Dimensionality Reduction algorithm
4.2.4 CREATING THE CONCEPT MAPS
The concept map preparation is the main step in the automatic ontology
construction defined by the proposed approach. A concept map is set of concepts that
have many inter relationships. When a query is subjected to the ontology, the concept
referred by the query has to be extracted for the client. So, a concept map strategy for
the proposed approach has been adopted. i.e., a set of connected concepts are joined
together to form a solution to a specific set of queries. As defined by the proposed
approach, ontology gives the most suitable results to the clients according to their
queries. The concept map preparation is the step taken prior to the ontology
construction. We can consider the ontology as a collection of concept maps. As
discussed above, the building blocks of concept map are the concepts obtained after
the dimensionality reduction process. The creation of concept map is similar to the
concept formation step, i.e. the association between each concept is extracted by the
mutual association values.
( ) |
( )
The above given expression is used for finding the inter relationship between
concepts. The major difference in this association value calculation is that, here the
joint probabilities of the concepts are related to both individual probabilities of the
46
concept. The joint probability of a concept is calculated by summing the joint
probabilities of the keywords corresponding to the concepts. i.e.
( | ) ( | ) ( | )
The summation continues upto last keyword selected for joint probability calculation.
Once all the concepts are formatted by the joint probability calculation, the joint
probabilities are stored in a set to evaluate the relationships. The concepts which
possess high association values are grouped to form the concept. The concept maps
are drawn using the Protégé tool and the associations are marked as relations in the
OWL.
Step 1. Select concept set C
[ ]
Step 2. Find probability of each concept,
Step 3. Calculate joint probability between concepts.
( | ) ( | ) ( | )
Step 4. Calculate the association values.
( ) |
( )
Step 5. Group the concepts based on their association values.
Step 6. Define concept map for the grouped concepts.
Step 7. End.
Figure 4.7: Concept map extraction algorithm
4.3 MINING ASSOCIATION RULES FOR ONTOLOGY
CONSTRUCTION
This section discusses mining association rules from the concepts extracted.
The two algorithms used for generating association rules are Frequent Pattern (FP)
47
growth and Sequential pattern mining algorithm called the PrefixSpan algorithm.
Each algorithm is discussed in detail in the following sections.
4.3.1. ONTOLOGY CONSTRUCTION USING FP-GROWTH
In this step, association rules are generated using the Frequent Pattern growth
algorithm. The rules generated hold the relation between the concepts and provides a
strong domain relation between the concepts in the concept set C. The FP-growth
algorithm generates a binary matrix in association with frequency of concepts and the
binary matrix contains values either 0 or 1. The values in the binary database are
generated according to the utilization of the concept and if the concept is associated
with more than one concept then a corresponding value is generated in the binary
matrix. The concept map is used as the deciding factor in this process. The utilization
of a concept can be easily extracted from the concept map with which a concept is
associated.
The created binary matrix is processed with the FP-growth algorithm for
mining the frequent terms used by the concepts. The FP-growth method extracts the
relation between concepts. It creates a database in the main memory in the form of a
tree by reading it twice. In the first phase, the database is analysed and the frequency
of each item is determined to remove less frequent concepts. Then, the remaining
frequent items are ordered in the descending order of their support values. During the
second phase, the database is again scanned to read the terms and the frequent terms
read are inserted into a so-called FP-tree structure (Grahne, G. and Zhu, J.2005). Once
the tree is constructed, subsequent pattern mining can be performed. The application
of FP-growth algorithm provides a set of rules from the concepts selected by the
algorithm. The associations between the concepts are extracted precisely by the FP-
growth algorithm.
Initially, the FP-growth algorithm selects a concept map from the set of concept maps.
Let be the set of concept maps and a concept map is represented by .
[ ]
48
where [ ]
The selected concept map is considered as a transaction in which each concept is
fixed as a node and given to the FP-growth algorithm. Now, an FP-tree is constructed
over the concepts in the concept map through the fixed order approach. In Fixed order
approach, when a concept overlaps with another or in the comparison process, if
similar concepts are obtained, then the count of similar concepts is incremented.
Suppose there are two transactions and . The
similar concepts like are repeated in the two transactions. Similarly all the
concepts will be compared with other concepts in the selected concept map. Linked
lists are created for the nodes that share similar concepts. The process gives a way to
create a path between the concepts resulting in the construction of a FP-tree, which is
the core data structure of the FP-growth algorithm.
A set of frequent patterns are then extracted from the FP-tree. If two
concepts are similar, then the counts of frequent patterns are incremented to define the
ontology. Once all the concepts are processed, a frequent pattern set is defined for the
processing of next phase of the FP-growth algorithm.
[ ]
The frequent item set is then processed using a bottom up approach to explore the
maximum association between each of the concepts. Each concept is recursively
analysed to find its associations. Thus, when a query is put forth by the user, the
proposed approach will give more appropriate answer because of the strong
association between the concepts.
Step 1. Select the concept map set.
[ ]
where, [ ]
Step 2. Define FP-tree.
Step 3. Extract the frequent patterns.
Step 4. Construct FP-tree
Step 5. Define frequent item set,
49
[ ]
Step 7. Select concept maps.
Step 8. Construct ontology based concept association values.
Step 9. End
Figure 4.8: Ontology construction algorithm
Figure 4.8 shows the different steps involved in the construction of the domain
specific ontology. The proposed approach uses a multilevel association feature, which
give a strong base to the relationship between the concepts in the concept map.
4.3.2. ONTOLOGY CONSTRUCTION USING PREFIX SPAN ALGORITHM
The proposed approach uses an alternative method to generate the association
rules, used for the construction of ontology. The features of the sequential pattern
mining algorithm are used in the proposed approach as an alternative to the FP-
growth algorithm. The concept set is selected and all the concepts are processed based
on the generalized sequential pattern mining algorithm. The major steps included in
the sequential pattern mining algorithm for extracting association between the
concepts are listed in what follows.
Step 1. Every concept from the concept map is selected and assigned as a
candidate of length-1.
Step 2. The length-1 patterns are calculated to find the count of each concept
in the concept set.
Step 3. The concept set considered here is the primitive concept set, i.e.
concept set without filtering.
Step 4. After length-1 calculation, for each concept of length-1, find k-length
patterns.
Step 5. Continue the steps until no concepts are left to process.
Figure 4.9: Sequential pattern mining algorithm
50
Once all the concepts are processed and the sequences of patterns are
calculated, the taxonomy is generated. The concepts are mapped in the taxonomy of
ontology according to their sequence, since the sequences of concepts are considered
as the association between the concepts. Thus, based on the sequential pattern, the
ontology is constructed. The sequential pattern mining algorithm is adopted for
finding association is because of the quick nature of the algorithm. Since the proposed
system deals with large number of documents, the FP-growth algorithm may not be
efficient all the time. So, when a large database is selected for ontology construction,
the effective way to generate association between the concepts is through the
Sequential pattern mining algorithm.
Figure 4.10: Sequential pattern mining
Figure 4.10 shows the generalised process in the frequent pattern mining algorithm,
which uses as an alternative way to find the associations between the concepts in the
concept map. So more precisely, the proposed approach uses the PrefixSpan
algorithm as the sequential pattern mining algorithm. The PrefixSpan algorithm is one
of the commonly used efficient sequential pattern mining techniques. The concepts in
the concept set have to be arranged in terms of the concept map structure, in order to
process using PrefixSpan algorithm. The concepts are selected and stored in a list.
The list contains the total number of concepts that are to be present in the taxonomy
of the ontology. The PrefixSpan algorithm explores the presence of concepts in the
documents in a sequential manner.
Concept set Length-1 patterns
k-length patterns
Taxonomy generation
51
The PrefixSpan algorithm is initialized with the concept set defined by the
proposed approach. The PrefixSpan algorithm works by making the minimum support
value of the sequential patterns as the key element. The minimum frequency with
which a concept is in sequence with other concepts is considered as the minimum
support. The PrefixSpan algorithm is processed in a length wise manner. The Prefix
Span algorithm scans the database containing only a single concept. These concepts
are considered as the length-1 concepts or length-1 sequence.
Example: For a sequence of data, if it contains concepts a and b and if a is of length-1
concept, then the association concept b is explored as a sequential association < a, b>.
Then the minimum support of pattern < a, b> is calculated. Similar to the above
process, the PrefixSpan algorithm explores for n-length sequential patterns.
According to the minimum support values of the sequential patterns, the associated
concepts can be extracted from the patterns. The patterns are then used for the
construction of the ontology based association of the concepts.
Figure 4.11: PrefixSpan algorithm
Concept set Initialize the concept set
Find initial sequence
Calculate min_support
Find length-n sequence
Find length-1 sequence
Sequential patterns
Extract Associations
52
Let us consider three documents from the text corpus. Each document is pre-
processed and the keywords are extracted and maintained as a sequence of concepts.
[DATABASE SYSTEMS, Data, Information, Management, System, DBMS,
Relational]
[MODEL, Data, Database, Design, Schema, steps]
[RELATIONAL MODEL, Data, Database, Relational, SQL, Schema]
Scan the database once and count the support for length-1 of candidates (concepts).
Support is defined as the count of total number of occurrences of a concept in a
database. Discard the redundant concepts from the set of concepts. Assume the
minimum support threshold as 2. Eliminate the concepts with min support less than 2.
The concepts identified are shown in the Table 4.2 and the support count is also
calculated.
Table 4.2: Length -1 Sequence Patterns
Now generate the length -2 patterns and eliminate the concept set having values less
than minimum support. So the concept sets (Database, Schema), (Database,
Relational) and (Schema, Relational) are eliminated.
Concepts Sup.count
Data 3
Information 1
Management 1
Database 2
Design 1
Schema 2
Relational 2
Steps 1
SQL 1
DBMS 1
53
Table 4.3: Length -2 Sequence patterns
Sequence
Concepts
Data Database Schema Relational
Data 2 2 2
Database 1 1
Schema 1
Relational
Next step is to generate the length-3 patterns as shown in Table 4.4 and eliminate the
concept sets with minimum support. Here the concept sets (Data, Database, relational)
and (Data, Relational, Schema) are eliminated.
Table 4.4: Length -3 Sequence patterns
Sequence
Concepts
(Database,
Schema)
(Relational,
Schema)
(Database,
Relational)
Data 2 1 1
No more patterns can be generated from the sequence (Data, database, Schema). A
pattern is treated as a concept map. When there is more than one pattern of sequence,
there will be a group of concept maps. A group of concept maps together constitute a
domain ontology.
4.4 RESULTS AND DISCUSSION
The following sections include the experimental evaluations of the proposed
approach under different testing conditions. The experiments were conducted on a
system running with Intel core i5 processor, 4GB RAM and 500 GB hard disk. The
programs were written and tested using Java program under JDK 1.7.0. The detailed
experimentation and analysis are discussed in the following sections.
54
4.4.1 DATASET DESCRIPTION
The proposed approach is an adaptive e-learning system working with the help
of an automated ontology. The dataset, which is used for evaluating the performance
of the proposed approach, is extracted from the field of Database management
systems (DBMS). A set of 1000 documents were extracted from different sources to
create the dataset for the proposed approach. The DBMS dataset uses then given as
input to the proposed approach to create the automatic ontology and the same dataset
was used for creating a manual ontology for the comparative analysis.
4.4.2 EVALUATION METRICS
This section discusses the different parameters, which were used for
evaluating the proposed approach based on its efficiency and effectiveness. The
evaluation processes considered two different ontologies viz., the proposed
automatically generated ontology and an ontology generated by domain experts. We
compared the concept map obtained from our proposed approach with concept map
developed by human experts. The parameters defined for the evaluations of concept
maps from the two different ontologies were recall, precision and F-measure. Each
concept from the ontologies was compared and treated based on recall, precision and
F-measure by their concept wise relation and associations.
| |
| |
| |
| |
The above expressions are used for the calculation of the recall of the concept based
on the concept to concept recall and association based recall. The concept to concept
recall is defined by the expression concept_recall, while assoc_recall defines the
recall based on association between the concepts. The values and
are nodes in the concept map generated by human experts and by our proposed
approach respectively. Similarly and
are respectively the number
55
of associations between concepts in the domain experts concept map and the system
generated concept map.
The other parameter used in the approach is precision, which is defined as the
ratio of intersection between the concepts from different ontologies to the system
generated ontology. In the proposed approach, a precision by association is also
defined for more effective evaluation. So, the concept precision and association
precision is calculated using the following expressions.
| |
| |
| |
| |
Here, the precision values of the concept to concept are represented using the
expression concept_prec and precision based on the association between concepts is
defined by assoc_prec expression. Once all the nodes in both ontologies are
processed, the precision and recall are calculated for the ontology as a whole. The
expressions given below provide the method to find the recall and precision values of
the ontology. The ontology recall can be defined as the weighted sum of concept
recall and the association recall with recall measure . Similarly the precision is also
defined using the precision measure . ( and =0.5 which acts as a balancing
factor)
( )
The ont_recall defines the recall values for the ontologies and ont_precision defines
precision values of the ontologies. Then, we define the F-measure for the ontologies
by associating the recall and precision values. Thus, we can define the F-measure as
the ratio of product of ontology precision and ontology recall to the sum of the
ontology precision and recall, which can be represented as,
=
56
The evaluation parameters given above are used for evaluating the
performance of the proposed automatically generated ontology. The evaluations are
done based on applying the proposed approach to a specific domain and the
performance of the proposed approach to that specific domain is determined.
4.4.3 DISCUSSION ABOUT ONTOLOGIES
The main objective of the proposed approach is to construct an adaptive
ontology automatically from a given set of data. Since, it is associated with the
e-learning system, there should be a wide range of documents relating to a particular
domain. In the proposed approach, we consider a set of 1000 documents extracted
from the “Database Management System” domain. As defined by the proposed
approach, the document set is processed with stop word removal and stemmer
algorithms. The processed documents are then subjected to keyword selection and
concept extraction phases. Once the concepts are extracted from the documents, the
concept map is generated according to the proposed approach.
After the concept extraction, the relationship between the concepts is extracted
using the text association rule mining algorithm. The relations between the concepts
are obtained from the rules generated by their association between each concept.
Thus, the taxonomy is generated from the association rule of each concept with
another. The redundant and unwanted concepts are moved down based on a
predefined threshold in the FP-growth algorithm. The remaining concepts are selected
and the taxonomy is generated according to the text association rule mining
algorithm.
The taxonomy generation from the concepts is the main feature of the
proposed approach. Its construction is based on the association rules obtained from
the algorithms. The proposed approach uses a tool called Protege to develop the
ontology. The concepts and their association values are embedded into the OWL
language with the help of Protege tool. Once the mappings of all concepts with their
relations are done, the data from the automatically generated concept maps are given
to the tool for depicting the ontology. The super nodes and the sub nodes are also
depicted clearly using the tool. The prime reason for selecting Protege as the tool is
57
because it generates a clear and evident OWL for the proposed ontology. The
concepts are represented by the rectangles and the relationships are marked with the
help of straight lines with arrows marked in the middle.
Figure 4.12: Protege tool: Automated domain ontology
Figure 4.12 shows the ontology representation of the proposed automated domain
ontology constructed with the help of the Protégé tool. The Protege tool defines the
concepts as super nodes and sub nodes according to the details given by the user. A
small segment from the above ontology shown in Figure 4.13 explains the concepts,
super node and sub nodes clearly. Here Database_design, Schema_refinement and
physical_database are called the concepts. They are enclosed in rectangles and the
relationships between them are depicted clearly. Database_design is the super node
and Schema_refinement and physical_database are the sub nodes of this segment.
58
Figure 4.13: Segment of automated OWL ontology
The Protege tool is used for generating the automated domain ontology and
the domain expert’s ontology and both these ontologies are generated by providing
the details separately.
Figure 4.14: A segment of OWL statements for the automated domain ontology
59
Once the automated domain ontology is processed by the Protege tool, the
concepts for the domain expert’s ontology are selected. In the domain expert’s
ontology, unlike the automated domain ontology, the concepts are extracted manually
from the document. The domain expert ontology construction is a tedious process
when compared to the automated domain ontology construction. The concepts
extracted manually are then subjected to the Protege tool, which will generate the
ontology according to the given information. A procedure similar to that used for the
automated domain ontology is used for constructing the domain expert’s ontology.
The major difference between the automated domain ontology and the domain
expert’s ontology is the association values, since the domain experts contain closely
related concepts, so associations are not integrated with the ontology. Let us consider
the ontology diagram shown in Figure 4.15 for the domain expert’s ontology
constructed by the Protege tool.
Figure 4.15: Protege tool: Domain expert’s ontology
60
Figure 4.15 shows a segment of the domain expert’s ontology constructed by the
Protege tool. The Protege tool took full advantage of the manually extracted concepts
and clearly differentiated the super nodes and sub nodes in the domain expert’s
ontology. The proposed approach considers more than one ontology, one which is the
automated ontology that is extracted from a set of e-learning documents using the
proposed data mining algorithms and the other is an ontology that is automated from
the existing BMI approach. The existing ontology is used for a comparison process
with the automated domain ontology. The main difference between the proposed
automated domain ontology with the existing ontology is the way the concept maps
are created. The database management domain is treated with these three ontologies
and their performances are evaluated based on recall, precision and F-measure.
Figure 4.16: A segment of OWL statements for the domain expert’s ontology
61
4.4.4 PERFORMANCE EVALUATION
This section includes the performance evaluation of the proposed approach
with different parameters listed in the above sections. Here, a comparison has been
done between the concept map obtained from automated domain ontology and
concept map developed by the existing system. The concept maps are analysed and
compared to find the effectiveness of our proposed approach over the existing system.
The experiments are conducted on the DBMS database as specified earlier.
The DBMS dataset is given to each of the ontologies as three different sets of
documents, first a set of 300 documents, then the size of documents are increased to a
set of 600 documents (which includes the first 300 documents together with a new set
of 300 documents) and the last set of 1000 documents (400 new documents are added
to previously available 600 documents) respectively. The documents in the dataset
contains the information regarding the database schema, users, architecture, indexing,
SQL, Query optimization, transaction Management, Recovery techniques. The topics
included in DBMS domain are discussed in Appendix D. The analyses are shown as
graphs in the following sections, and it can be seen that the proposed approach is
more data sensitive and efficient in finding the concepts maps. A series of analysis
have been done to evaluate the proposed ontology based on association rule mining
and sequential pattern mining algorithms
Table 4.5: Recall measures using different ontologies
No. of
Docs
Existing
ontology
Proposed ontology-
association rule
Proposed ontology-
Sequential pattern
300 0.38 0.44 0.51
600 0.54 0.76 0.76
1000 0.68 0.82 0.89
62
Figure 4.17: Plotting of concept recall measure of different ontologies
Figure 4.17 shows the evaluation of the ontologies based on the concept based recall
parameters. The concept recall values are obtained by comparing the existing
ontology and the automated domain ontology with the domain expert’s ontology. So,
based on the number of concepts in the domain expert’s ontology and the evaluating
ontologies, the recall values are calculated for the three sets of data. The evaluation
from the graph shows that, the automated domain ontology achieves higher recall
values when compared with the existing ontology. Table 4.5 shows that the peak
value obtained for the proposed automated domain ontology is 0.89 for the set of
1000 documents
Table 4.6: Precision measures using different ontologies
No. of
Docs
Existing
ontology
Proposed ontology-
association rule
Proposed ontology-
Sequential pattern
300 0.29 0.31 0.38
600 0.38 0.49 0.54
1000 0.49 0.52 0.58
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
300 600 1000
Co
nce
pt
Rec
all v
alu
es
Number of Documents
Existing ontology
Proposed ontology-Association Rule(FP-growth)
Proposed ontology-Sequentialpattern(PrefixSpan)
63
Figure 4.18: Concept precision mappings of different ontologies
Figure 4.18 shows the evaluation of the different ontologies on the basis of precision
parameter. The Precision is calculated by comparing the number of concepts in the
existing ontology and the automated domain ontology with the domain expert’s
ontology. The analysis from the graph showed that the proposed approach has higher
precision rate as compared to the existing ontology. Table 4.6 shows that the peak
precision rate obtained for the automated domain ontology is 0.58 for the dataset of
1000 documents.
Table 4.7: Association recall measures using different ontologies
No.
Docs
Existing
ontology
Proposed ontology-
association rule
Proposed ontology-
Sequential pattern
300 0.36 0.42 0.55
600 0.52 0.74 0.78
1000 0.64 0.84 0.88
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
300 600 1000
Co
nce
pt-
Pre
cisi
on
val
ues
Number of Documents
Existing ontology
Proposed ontology-Association Rule(FP-growth)
Proposed ontology-Sequentialpattern(PrefixSpan)
64
Figure 4.19: Association recall mappings of different ontologies
Figure 4.19 shows the behaviour of the association recall values. The analysis shows
that the association recall is also no different from the concept recall parameters.
Table 4.7 shows that the peak association recall rate obtained for the automated
domain ontology is 0.88 for the dataset of 1000 documents.
Table 4.8: Association precision measures using different ontologies
No. of
Docs
Existing
ontology
Proposed ontology-
association rule
proposed ontology-
Sequential pattern
300 0.27 0.34 0.36
600 0.36 0.49 0.55
1000 0.45 0.54 0.58
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
300 600 1000
Ass
oci
atio
n R
ecal
l val
ues
Number of Documents
Existing ontology
Proposed ontology-Association Rule(FP-growth)
Proposed ontology-Sequentialpattern(PrefixSpan)
65
Figure 4.20: Association precision mappings of different ontologies
Figure 4.20 shows the behaviour of association precision parameters for the three
different ontologies. The analysis from the graph shows that, among the three
ontologies, the ontology developed using the sequential pattern mining method has a
greater precision rate. Table 4.8 shows that the peak association recall rate obtained
for the automated domain ontology is 0.58 for the dataset of 1000 documents.
Table 4.9: Ontology recall measures using different ontologies
No.
Docs
existing
ontology
Proposed ontology-
association rule
proposed ontology-
Sequential pattern
300 0.37 0.43 0.5
600 0.53 0.71 0.75
1000 0.66 0.83 0.85
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
300 600 1000
Ass
oci
atio
n P
reci
sio
n v
alu
es
Number of Documents
Existing ontology
Proposed ontology-Association Rule(FP-growth)
Proposed ontology-Sequentialpattern(PrefixSpan)
66
Figure 4.21: Ontology recall mappings of different ontologies
Figure 4.21 shows the ontology recall mappings for the different ontologies. The
ontology recall mappings are generated by considering concept recall values and
association recall values. Since the ontology recall values are associated with the
other recall values, there will not many differences in the impact of recall values. The
analysis from the graph clearly specifies it. Table 4.9 shows that the peak ontology
recall rate obtained for the automated domain ontology is 0.85 for the dataset of 1000
documents.
Table 4.10: Ontology precision measures using different ontologies
No. of
Docs
Existing
ontology
Proposed ontology-
association rule
Proposed ontology-
Sequential pattern
300 0.28 0.298 0.37
600 0.38 0.41 0.47
1000 0.40 0.48 0.52
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
300 600 1000
On
tolo
gy R
ecal
l val
ues
Number of Documents
Existing ontology
Proposed ontology-Association Rule(FP-growth)
Proposed ontology-Sequentialpattern(PrefixSpan)
67
Figure 4.22: Ontology precision mappings of different ontologies
Figure 4.22, the precision of different ontologies is plotted according to their
performance on DBMS database. Similar to the ontology recall, the ontology
precision is also associated with the concept precision and association precision. The
analysis from the graph indicates that, the sequential pattern powered ontology has
more precision rate as compared to the other two ontologies. Table 4.10 shows that
the peak ontology precision rate obtained for the automated domain ontology is 0.52
for the dataset of 1000 documents.
Table 4.11: Ontology F-measures using different ontologies
No. of
Docs
Existing
ontology
Proposed ontology-
association rule
Proposed ontology-
Sequential pattern
300 0.32 0.35 0.43
600 0.44 0.52 0.58
1000 0.50 0.60 0.65
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
300 600 1000
On
tolo
gy P
reci
sio
n v
alu
es
Number of Documents
Existing ontology
Proposed ontology-Association Rule(FP-growth)
Proposed ontology-Sequentialpattern(PrefixSpan)
68
Figure 4.23: Ontology F-measure mappings of different ontologies
Figure 4.23 depicts the ontology F-measure values for the three different ontologies
processed with the DBMS dataset. The F-measure values are calculated based on the
ontology recall values and ontology precision values. The analysis from the graph
shows that, the F-measure rate is high for the ontology based on the sequential pattern
mining. Both the proposed ontologies are up a head of the existing ontology based on
the performance evaluation. Table 4.11 shows that the peak ontology precision rate
obtained for the automated domain ontology is 0.65 for the dataset of 1000
documents.
4.5 CONCLUSIONS
The techniques for constructing domain ontology automatically using different
techniques such as association rule mining and sequential mining algorithms have
been implemented. A set of 1000 documents have been processed and the concepts
were extracted. Based on the concepts a concept map is created for the DBMS
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
300 600 1000
On
to F
-mea
sure
val
ues
Number of Documents
Existing ontology
Proposed ontology-Association Rule(FP-growth)
Proposed ontology-Sequentialpattern(PrefixSpan)
69
domain. The most attractive feature of the proposed method is that the student
interaction GUI (Graphical User Interface), which evaluates the student’s capability in
learning a specific topic. From the initial assessment, the system generated an e-
learning content adaptively for the student with the aid of automated domain
ontology. The performance of the proposed system in automated ontology
construction was evaluated with the help of precision, recall and f-measure. The peak
rates obtained for different evaluation parameters are, 0.85 for ontology recall, 0.52
for ontology precision and 0.65 for ontology f-measure. The application of automated
domain ontology was discussed through the help of a case study.