an automated approach to extract domain ontology...

34

Chapter 4

An Automated Approach to Extract Domain Ontology for

E-Learning System

E-learning is becoming a hot area in the field of both online and offline

education. E-learning deals with the interaction between the teacher and learner on the

basis of knowledge possessed by the learner. Aware about the learner’s knowledge

level, the teacher can easily provide the required lessons to the student through the

online medium such as Internet. Adaptive learning is such an educational method

which uses computers as an interactive method. It also tailors the learning materials

based on the learner’s knowledge level. In this chapter an automated approach to

extract domain ontology is designed with the objective of the enhancement of the

efficiency of adaptive e-learning.

4.1 ONTOLOGY AND E-LEARNING

Ontologies have become a key concept for providing more relevant lessons to

the learner than other means. Ontologies are established for information sharing and

are extensively used as a means for conceptually structuring domains of interest.

Ontologies help us to describe, develop, annotate and relate the educational resources,

which in turn will help in the retrieval of more relevant resources for the learners.

Ontology can be created by a domain expert and embedded into an e-learning system

or it can be automated and embedded in to e-learning system. Automation of

ontologies will reduce the human intervention and also the time required for ontology

creation. The chief advantage of the proposed approach is automated ontology

construction through concept map extraction. It is effectively achieved through the

use of association rule mining and sequential pattern mining algorithms. The

constructed domain ontology is applied to the e-learning system so that the real-time

application of the proposed approach is discussed.

35

Figure 4.1: Sample ontology for E-learning

Database

Active DB

Cloud DB

Data Structures

DB languages

DB systems

DBMS

DBA

DB developers

DB design

DB modelling

OO model

DB model

Data warehouse

DB machines

Hierarchical

Relational

Network

Entity-relational

QL

DML

DDL

Is a

Is a

Is a

Part of

Union of Union of

Has

Has

Part of

Part of

Is a

Is a

Has

Has

Has

Has

Is a

Is a

Is a

Part of

Union of

Part of

Has

Part of

Union of

Part of

Part of

Part of

36

Figure 4.1 shows a sample structure of an ontology constructed by domain

experts for the e-learning system. Though the structure is a basic graph like structure,

we incorporate relations with each node present in the ontology. A node is a topic

related to the domain that is considered for the construction of e-learning system.

4.2 ONTOLOGY CONSTRUCTION

The main objective of the proposed approach is to construct an ontology for an e-

learning system which fulfills the needs of clients. The client mentioned in the

approach is related to the student or person who makes use of the e-learning system.

The ontology is listed with a detailed association between the nodes or the topics. The

ontology construction undergoes a series of developing steps to ensure that the e-

learning system is an effective one. The ontology is constructed from a text corpus,

which contain a number of documents regarding a particular domain. So, the ontology

has to be created based on the above specified domain. The main steps in the

construction of an ontology are:

Processing the documents

Outline the domain ontology

Concept Processing (Extraction of concepts from the domain)

Creating concept maps

The above four steps serve as the main components of the proposed approach.

These processes have the virtue of producing an effective ontology for the learning

system. Based on these steps, an automatic ontology construction method is provided.

The proposed approach derives a specific algorithm to give weightage to all the nodes

and to provide association between the nodes. The nodes are assigned their inter-

relationships through a mutual association function. The different document

processing methods will help to extract the key features from each document. The key

features are then associated together to form the concepts and from the concepts, an

effective concept map is created for the e-learning system. Thus, a query from a user

is used to extract a concept map regarding that query.

37

Figure 4.2: Ontology construction

Figure 4.2 depicts the block diagram for the construction of an ontology for

the specified e-learning system in our proposed approach. In the succeeding sections,

the proposed approach in discussed in detail.

4.2.1 DOCUMENT PROCESSING

The initial part of the ontology construction is to process the documents to

extract the keywords from the documents. The text corpus is selected and the

documents from the corpus are extracted for the processing which is done by applying

two basic document processing steps. Initially a stop word removal process removes

all the non-profitable words from the documents. Once the stop word removal is

finished, a stemming algorithm (Willet, 2006) is applied to extract the keywords in

their root form. The keywords from the documents are then stored in an array by

making sure that no words are repeated words. The stored keywords are then

transferred to the concept extraction phase.

For example: Consider two statements from the text corpus

“Database is a collection of related information. Data in a database are stored in

the form of tables.”

Text corpus

Process

documents

Outline domain

ontology Concept extraction

Concept map

38

The stop words are: is, a, of, from, are, in, the.

Keywords extracted: Database, Collection, Related, Information, data, database,

stored, tables.

Stemming: Collection - Collect

Related –Relate

Tables - Table

4.2.2 OUTLINING DOMAIN ONTOLOGY

The procedure of the ontology construction should be specific and transparent

as we define the e-learning system as a user friendly one. In this section, the different

steps that are needed for the efficient construction of the ontology are defined. The

basic structure of the domain ontology can be presented as in Figure 4.3.

Figure 4.3: Outline of domain ontology construction

Concept extraction

Redundancy check

Dimensionality reduction

Deriving associations

Creating concept map

Ontology construction

39

The outlining of the structure of the ontology should be precise, because

ontology is a domain specific one. The main concentration is needed in the concept

extraction phase. The concept should be associated to more concepts and it should

possess an individual existence. So, the redundancy in the concept should be

identified to ease the process of execution. The other major part is regarding the

dimension of the concept set. For high dimensional concept set, the dimension should

be reduced to make the associations more rigid and precise.

4.2.3 CONCEPT PROCESSING

A concept is defined as a keyword or set of keywords that defines a common

topic as reference. So, the purpose of concept processing step is to identify such

concepts from the set of keywords, which is already extracted. Let K be the set of n

keywords defined by,

The set K includes the keywords from all the documents. Now we process

each keyword to find the concept. Each keyword is selected and processed with other

keywords to find the association between them. Initially, a sorting process is applied

to the set of keywords based on their frequency. The most frequent keywords are

selected as top priority keywords. These top priority keywords are processed initially

for concept extraction. The frequency of each keyword k is calculated based on their

presence in the document present in the text corpus.

∑

The frequency is calculated as the number of keywords ( present in the

document ( to the total number of keywords (N(k)) in . Now the set K is

reformatted with the most frequent keywords in the descending order of their

frequency values. We adopt a sentence level windowing process, in which the

window moves in a sliding manner. The text window formed is four term window

40

which enclosed in a sentence. As the window slides, the words enclosed in the

window are selected for association calculation. The association is calculated as,

( ) |

The association between two keywords is obtained through the probability of

occurrence of the keywords. A conditional probability is adopted for finding the

relation between the keywords. The value of the association between the keywords is

used to extract the concept. If the association value is high, it is considered as a

concept. The process is continued upto the last document in the text corpora. A

threshold value is set for making the distinction between the keywords and concepts.

If the association value is higher than the threshold, then the corresponding keywords

constitute a concept. Similarly, all the association values are analysed and a concept

set is formed which is defined as,

The set C represents the concepts which are selected after the association

value analysis. A concept is defined as a group of two or more associated keywords.

The keywords selected are based on the frequency of their occurrence in the

considered domain. Thus, according to association of a keyword with another, a set of

concepts is formed. The concepts are used then as the building blocks of domain

specific ontology. Though the concepts are generated from the most frequent

keywords, the concept set C may contain redundant concepts. So in order to make the

concept set more specific, a redundancy analysis and dimensionality reduction

process are carried out.

Step 1. Select text corpus

Step 2. Apply stop word removal algorithm

Step 3. Apply stemming algorithm

Step 4. Store keywords in set K,

Step 5. Find frequency of every keyword

Step 6. Sort keywords based on frequency

41

∑

Step 7. Find joint probability between keywords

| )

Step8. Calculate association values between concepts

( ) |

Step 9. List association values

Step10. Generate concept set C,

Step11. Stop.

Figure 4.4: Concept Extraction algorithm

4.2.3.1 REDUNDANCY ANALYSIS

The extraction of concepts produces a number of dominant and unwanted

concepts and these make the dataset redundant. So, in order to reduce the redundancy,

a redundancy analysis has been carried out on the extracted concepts to ensure that the

concepts which are selected are not redundant. We use the information gain and

entropy technique (Leow et al. 2008) that are used to detect how redundant a term is

in set of documents. Let us consider the concept set for the redundancy check,

Each concept in the concept set C is associated with one or more keywords, so

the redundancy analysis concentrates on those words. Suppose 1c is the concept that

is present in the set C, and possesses 3 terms. The redundancy analysis checks the

presence of each keyword in the text corpus through the information gain and entropy

method. Table 4.1 shows the probabilities of the different terms in the concept c1.

42

Table 4.1: Probability of various terms in Concept c1

P(k1)=2/3 P(k2)=1/3 P(k3)=1/3

The probability is calculated in order to find the occurrence of each term in the

concept c1. The number of bits needed to encode a keyword should be calculated in

order to find the entropy function. The number of bits which are needed to encode k1

with probability 2/3 can be calculated using the following formula,

where, is the number of bits needed to encode keyword . The entropy value

for the concept c1 becomes,

∑

Here, the function is the entropy value of the concept. So from this, we have

applied a threshold for pruning the terms which have a bit value below the applied

threshold. Similarly, we prune the redundant concepts from the domain on the basis of

their entropy values. Generally, the entropy function is defined as,

∑

The concepts with high entropy values are sustained in the domain and those

with low entropy values are pruned. By applying this method, we can avoid the

unwanted concepts from the domain and the information gain entropy method is a

highly reliable method.

Step 1. Select Concept set

Step 2. Encode all the keywords using bit encoding

, where is the probability

Step 3. Find entropy value for all the concepts

43

∑

Step 4. Filter the concepts based on the entropy values

Step 5. Store the filtered concepts in concept set C

Step 6. End

Figure 4.5: Redundancy analysis algorithm

4.2.3.2 REDUCING THE DIMENSION OF THE CONCEPT SET

A major difficulty in the ontology construction process is because of the high

dimension of the concept set. So, a dimensionality reduction method called Principal

component analysis (PCA) (Jonathon Shlens, 2005) is used for reducing the

dimension of the concept set. The concept set is considered as a M * N matrix for

applying PCA algorithm for the reducing the dimension. A main feature to find is the

covariance matrix from the empirical mean values for efficient dimensional

reduction. The empirical mean values can be listed as,

∑

where, D is the matrix of size . Then, the derivations of the means are

calculated and stored in a matrix D. The ‘d’ values are then used for the calculation of

covariance matrix

where h is an n x 1 column vector and

The eigenvector of the covariance matrix is calculated and stored in a separate

matrix E. After finding the eigenvalues for the matrix we calculate diagonal

matrix of .

44

From the above calculated data, a matrix with eigenvalues is constructed for reducing

the dimension of the data under consideration

where, W is a LM matrix where L is the columns of matrix Cov . The next step of

the dimensionality reduction process is to calculate the empirical standard deviation,

which contains the square root of each element along the main diagonal of the

covariance matrix covC and then, this value is supplied to calculate the z-score matrix.

√

Thus, we get the reduced dimension matrix R , which is the dot product of the z-score

matrix ( ) and conjugate of the matrix , which has dimension of . This

matrix reduces the concept space of our concept map and hence increases the speed of

our search. The dimensionally reduced matrix contains the most relevant concepts

regarding the domain under consideration. Thus, after the concept processing step, the

proposed approach provides a set of data which will contain the most prime concepts

regarding the domain. The multilevel concept processing steps are used for

constructing an effective ontology for the e-learning system.

Step 1. Select concept set

Step 2. Define M*N matrix

Step 3. Calculate empirical mean

∑

Step 4. Find the covariance matrix

Step 5. Evaluate diagonal matrix for

45

Step 6. Find the z- score

Step 7. Derive the dimensionally reduced matrix R

Step 8. The elements in matrix R constitute the concept set C.

Step 9. End

Figure 4.6: Dimensionality Reduction algorithm

4.2.4 CREATING THE CONCEPT MAPS

The concept map preparation is the main step in the automatic ontology

construction defined by the proposed approach. A concept map is set of concepts that

have many inter relationships. When a query is subjected to the ontology, the concept

referred by the query has to be extracted for the client. So, a concept map strategy for

the proposed approach has been adopted. i.e., a set of connected concepts are joined

together to form a solution to a specific set of queries. As defined by the proposed

approach, ontology gives the most suitable results to the clients according to their

queries. The concept map preparation is the step taken prior to the ontology

construction. We can consider the ontology as a collection of concept maps. As

discussed above, the building blocks of concept map are the concepts obtained after

the dimensionality reduction process. The creation of concept map is similar to the

concept formation step, i.e. the association between each concept is extracted by the

mutual association values.

( ) |

( )

The above given expression is used for finding the inter relationship between

concepts. The major difference in this association value calculation is that, here the

joint probabilities of the concepts are related to both individual probabilities of the

46

concept. The joint probability of a concept is calculated by summing the joint

probabilities of the keywords corresponding to the concepts. i.e.

( | ) ( | ) ( | )

The summation continues upto last keyword selected for joint probability calculation.

Once all the concepts are formatted by the joint probability calculation, the joint

probabilities are stored in a set to evaluate the relationships. The concepts which

possess high association values are grouped to form the concept. The concept maps

are drawn using the Protégé tool and the associations are marked as relations in the

OWL.

Step 1. Select concept set C

[ ]

Step 2. Find probability of each concept,

Step 3. Calculate joint probability between concepts.

( | ) ( | ) ( | )

Step 4. Calculate the association values.

( ) |

( )

Step 5. Group the concepts based on their association values.

Step 6. Define concept map for the grouped concepts.

Step 7. End.

Figure 4.7: Concept map extraction algorithm

4.3 MINING ASSOCIATION RULES FOR ONTOLOGY

CONSTRUCTION

This section discusses mining association rules from the concepts extracted.

The two algorithms used for generating association rules are Frequent Pattern (FP)

47

growth and Sequential pattern mining algorithm called the PrefixSpan algorithm.

Each algorithm is discussed in detail in the following sections.

4.3.1. ONTOLOGY CONSTRUCTION USING FP-GROWTH

In this step, association rules are generated using the Frequent Pattern growth

algorithm. The rules generated hold the relation between the concepts and provides a

strong domain relation between the concepts in the concept set C. The FP-growth

algorithm generates a binary matrix in association with frequency of concepts and the

binary matrix contains values either 0 or 1. The values in the binary database are

generated according to the utilization of the concept and if the concept is associated

with more than one concept then a corresponding value is generated in the binary

matrix. The concept map is used as the deciding factor in this process. The utilization

of a concept can be easily extracted from the concept map with which a concept is

associated.

The created binary matrix is processed with the FP-growth algorithm for

mining the frequent terms used by the concepts. The FP-growth method extracts the

relation between concepts. It creates a database in the main memory in the form of a

tree by reading it twice. In the first phase, the database is analysed and the frequency

of each item is determined to remove less frequent concepts. Then, the remaining

frequent items are ordered in the descending order of their support values. During the

second phase, the database is again scanned to read the terms and the frequent terms

read are inserted into a so-called FP-tree structure (Grahne, G. and Zhu, J.2005). Once

the tree is constructed, subsequent pattern mining can be performed. The application

of FP-growth algorithm provides a set of rules from the concepts selected by the

algorithm. The associations between the concepts are extracted precisely by the FP-

growth algorithm.

Initially, the FP-growth algorithm selects a concept map from the set of concept maps.

Let be the set of concept maps and a concept map is represented by .

[ ]

48

where [ ]

The selected concept map is considered as a transaction in which each concept is

fixed as a node and given to the FP-growth algorithm. Now, an FP-tree is constructed

over the concepts in the concept map through the fixed order approach. In Fixed order

approach, when a concept overlaps with another or in the comparison process, if

similar concepts are obtained, then the count of similar concepts is incremented.

Suppose there are two transactions and . The

similar concepts like are repeated in the two transactions. Similarly all the

concepts will be compared with other concepts in the selected concept map. Linked

lists are created for the nodes that share similar concepts. The process gives a way to

create a path between the concepts resulting in the construction of a FP-tree, which is

the core data structure of the FP-growth algorithm.

A set of frequent patterns are then extracted from the FP-tree. If two

concepts are similar, then the counts of frequent patterns are incremented to define the

ontology. Once all the concepts are processed, a frequent pattern set is defined for the

processing of next phase of the FP-growth algorithm.

[ ]

The frequent item set is then processed using a bottom up approach to explore the

maximum association between each of the concepts. Each concept is recursively

analysed to find its associations. Thus, when a query is put forth by the user, the

proposed approach will give more appropriate answer because of the strong

association between the concepts.

Step 1. Select the concept map set.

[ ]

where, [ ]

Step 2. Define FP-tree.

Step 3. Extract the frequent patterns.

Step 4. Construct FP-tree

Step 5. Define frequent item set,

49

[ ]

Step 7. Select concept maps.

Step 8. Construct ontology based concept association values.

Step 9. End

Figure 4.8: Ontology construction algorithm

Figure 4.8 shows the different steps involved in the construction of the domain

specific ontology. The proposed approach uses a multilevel association feature, which

give a strong base to the relationship between the concepts in the concept map.

4.3.2. ONTOLOGY CONSTRUCTION USING PREFIX SPAN ALGORITHM

The proposed approach uses an alternative method to generate the association

rules, used for the construction of ontology. The features of the sequential pattern

mining algorithm are used in the proposed approach as an alternative to the FP-

growth algorithm. The concept set is selected and all the concepts are processed based

on the generalized sequential pattern mining algorithm. The major steps included in

the sequential pattern mining algorithm for extracting association between the

concepts are listed in what follows.

Step 1. Every concept from the concept map is selected and assigned as a

candidate of length-1.

Step 2. The length-1 patterns are calculated to find the count of each concept

in the concept set.

Step 3. The concept set considered here is the primitive concept set, i.e.

concept set without filtering.

Step 4. After length-1 calculation, for each concept of length-1, find k-length

patterns.

Step 5. Continue the steps until no concepts are left to process.

Figure 4.9: Sequential pattern mining algorithm

50

Once all the concepts are processed and the sequences of patterns are

calculated, the taxonomy is generated. The concepts are mapped in the taxonomy of

ontology according to their sequence, since the sequences of concepts are considered

as the association between the concepts. Thus, based on the sequential pattern, the

ontology is constructed. The sequential pattern mining algorithm is adopted for

finding association is because of the quick nature of the algorithm. Since the proposed

system deals with large number of documents, the FP-growth algorithm may not be

efficient all the time. So, when a large database is selected for ontology construction,

the effective way to generate association between the concepts is through the

Sequential pattern mining algorithm.

Figure 4.10: Sequential pattern mining

Figure 4.10 shows the generalised process in the frequent pattern mining algorithm,

which uses as an alternative way to find the associations between the concepts in the

concept map. So more precisely, the proposed approach uses the PrefixSpan

algorithm as the sequential pattern mining algorithm. The PrefixSpan algorithm is one

of the commonly used efficient sequential pattern mining techniques. The concepts in

the concept set have to be arranged in terms of the concept map structure, in order to

process using PrefixSpan algorithm. The concepts are selected and stored in a list.

The list contains the total number of concepts that are to be present in the taxonomy

of the ontology. The PrefixSpan algorithm explores the presence of concepts in the

documents in a sequential manner.

Concept set Length-1 patterns

k-length patterns

Taxonomy generation

51

The PrefixSpan algorithm is initialized with the concept set defined by the

proposed approach. The PrefixSpan algorithm works by making the minimum support

value of the sequential patterns as the key element. The minimum frequency with

which a concept is in sequence with other concepts is considered as the minimum

support. The PrefixSpan algorithm is processed in a length wise manner. The Prefix

Span algorithm scans the database containing only a single concept. These concepts

are considered as the length-1 concepts or length-1 sequence.

Example: For a sequence of data, if it contains concepts a and b and if a is of length-1

concept, then the association concept b is explored as a sequential association < a, b>.

Then the minimum support of pattern < a, b> is calculated. Similar to the above

process, the PrefixSpan algorithm explores for n-length sequential patterns.

According to the minimum support values of the sequential patterns, the associated

concepts can be extracted from the patterns. The patterns are then used for the

construction of the ontology based association of the concepts.

Figure 4.11: PrefixSpan algorithm

Concept set Initialize the concept set

Find initial sequence

Calculate min_support

Find length-n sequence

Find length-1 sequence

Sequential patterns

Extract Associations

52

Let us consider three documents from the text corpus. Each document is pre-

processed and the keywords are extracted and maintained as a sequence of concepts.

[DATABASE SYSTEMS, Data, Information, Management, System, DBMS,

Relational]

[MODEL, Data, Database, Design, Schema, steps]

[RELATIONAL MODEL, Data, Database, Relational, SQL, Schema]

Scan the database once and count the support for length-1 of candidates (concepts).

Support is defined as the count of total number of occurrences of a concept in a

database. Discard the redundant concepts from the set of concepts. Assume the

minimum support threshold as 2. Eliminate the concepts with min support less than 2.

The concepts identified are shown in the Table 4.2 and the support count is also

calculated.

Table 4.2: Length -1 Sequence Patterns

Now generate the length -2 patterns and eliminate the concept set having values less

than minimum support. So the concept sets (Database, Schema), (Database,

Relational) and (Schema, Relational) are eliminated.

Concepts Sup.count

Data 3

Information 1

Management 1

Database 2

Design 1

Schema 2

Relational 2

Steps 1

SQL 1

DBMS 1

53

Table 4.3: Length -2 Sequence patterns

Sequence

Concepts

Data Database Schema Relational

Data 2 2 2

Database 1 1

Schema 1

Relational

Next step is to generate the length-3 patterns as shown in Table 4.4 and eliminate the

concept sets with minimum support. Here the concept sets (Data, Database, relational)

and (Data, Relational, Schema) are eliminated.

Table 4.4: Length -3 Sequence patterns

Sequence

Concepts

(Database,

Schema)

(Relational,

Schema)

(Database,

Relational)

Data 2 1 1

No more patterns can be generated from the sequence (Data, database, Schema). A

pattern is treated as a concept map. When there is more than one pattern of sequence,

there will be a group of concept maps. A group of concept maps together constitute a

domain ontology.

4.4 RESULTS AND DISCUSSION

The following sections include the experimental evaluations of the proposed

approach under different testing conditions. The experiments were conducted on a

system running with Intel core i5 processor, 4GB RAM and 500 GB hard disk. The

programs were written and tested using Java program under JDK 1.7.0. The detailed

experimentation and analysis are discussed in the following sections.

54

4.4.1 DATASET DESCRIPTION

The proposed approach is an adaptive e-learning system working with the help

of an automated ontology. The dataset, which is used for evaluating the performance

of the proposed approach, is extracted from the field of Database management

systems (DBMS). A set of 1000 documents were extracted from different sources to

create the dataset for the proposed approach. The DBMS dataset uses then given as

input to the proposed approach to create the automatic ontology and the same dataset

was used for creating a manual ontology for the comparative analysis.

4.4.2 EVALUATION METRICS

This section discusses the different parameters, which were used for

evaluating the proposed approach based on its efficiency and effectiveness. The

evaluation processes considered two different ontologies viz., the proposed

automatically generated ontology and an ontology generated by domain experts. We

compared the concept map obtained from our proposed approach with concept map

developed by human experts. The parameters defined for the evaluations of concept

maps from the two different ontologies were recall, precision and F-measure. Each

concept from the ontologies was compared and treated based on recall, precision and

F-measure by their concept wise relation and associations.

| |

| |

| |

| |

The above expressions are used for the calculation of the recall of the concept based

on the concept to concept recall and association based recall. The concept to concept

recall is defined by the expression concept_recall, while assoc_recall defines the

recall based on association between the concepts. The values and

are nodes in the concept map generated by human experts and by our proposed

approach respectively. Similarly and

are respectively the number

55

of associations between concepts in the domain experts concept map and the system

generated concept map.

The other parameter used in the approach is precision, which is defined as the

ratio of intersection between the concepts from different ontologies to the system

generated ontology. In the proposed approach, a precision by association is also

defined for more effective evaluation. So, the concept precision and association

precision is calculated using the following expressions.

| |

| |

| |

| |

Here, the precision values of the concept to concept are represented using the

expression concept_prec and precision based on the association between concepts is

defined by assoc_prec expression. Once all the nodes in both ontologies are

processed, the precision and recall are calculated for the ontology as a whole. The

expressions given below provide the method to find the recall and precision values of

the ontology. The ontology recall can be defined as the weighted sum of concept

recall and the association recall with recall measure . Similarly the precision is also

defined using the precision measure . ( and =0.5 which acts as a balancing

factor)

( )

The ont_recall defines the recall values for the ontologies and ont_precision defines

precision values of the ontologies. Then, we define the F-measure for the ontologies

by associating the recall and precision values. Thus, we can define the F-measure as

the ratio of product of ontology precision and ontology recall to the sum of the

ontology precision and recall, which can be represented as,

=

56

The evaluation parameters given above are used for evaluating the

performance of the proposed automatically generated ontology. The evaluations are

done based on applying the proposed approach to a specific domain and the

performance of the proposed approach to that specific domain is determined.

4.4.3 DISCUSSION ABOUT ONTOLOGIES

The main objective of the proposed approach is to construct an adaptive

ontology automatically from a given set of data. Since, it is associated with the

e-learning system, there should be a wide range of documents relating to a particular

domain. In the proposed approach, we consider a set of 1000 documents extracted

from the “Database Management System” domain. As defined by the proposed

approach, the document set is processed with stop word removal and stemmer

algorithms. The processed documents are then subjected to keyword selection and

concept extraction phases. Once the concepts are extracted from the documents, the

concept map is generated according to the proposed approach.

After the concept extraction, the relationship between the concepts is extracted

using the text association rule mining algorithm. The relations between the concepts

are obtained from the rules generated by their association between each concept.

Thus, the taxonomy is generated from the association rule of each concept with

another. The redundant and unwanted concepts are moved down based on a

predefined threshold in the FP-growth algorithm. The remaining concepts are selected

and the taxonomy is generated according to the text association rule mining

algorithm.

The taxonomy generation from the concepts is the main feature of the

proposed approach. Its construction is based on the association rules obtained from

the algorithms. The proposed approach uses a tool called Protege to develop the

ontology. The concepts and their association values are embedded into the OWL

language with the help of Protege tool. Once the mappings of all concepts with their

relations are done, the data from the automatically generated concept maps are given

to the tool for depicting the ontology. The super nodes and the sub nodes are also

depicted clearly using the tool. The prime reason for selecting Protege as the tool is

57

because it generates a clear and evident OWL for the proposed ontology. The

concepts are represented by the rectangles and the relationships are marked with the

help of straight lines with arrows marked in the middle.

Figure 4.12: Protege tool: Automated domain ontology

Figure 4.12 shows the ontology representation of the proposed automated domain

ontology constructed with the help of the Protégé tool. The Protege tool defines the

concepts as super nodes and sub nodes according to the details given by the user. A

small segment from the above ontology shown in Figure 4.13 explains the concepts,

super node and sub nodes clearly. Here Database_design, Schema_refinement and

physical_database are called the concepts. They are enclosed in rectangles and the

relationships between them are depicted clearly. Database_design is the super node

and Schema_refinement and physical_database are the sub nodes of this segment.

58

Figure 4.13: Segment of automated OWL ontology

The Protege tool is used for generating the automated domain ontology and

the domain expert’s ontology and both these ontologies are generated by providing

the details separately.

Figure 4.14: A segment of OWL statements for the automated domain ontology

59

Once the automated domain ontology is processed by the Protege tool, the

concepts for the domain expert’s ontology are selected. In the domain expert’s

ontology, unlike the automated domain ontology, the concepts are extracted manually

from the document. The domain expert ontology construction is a tedious process

when compared to the automated domain ontology construction. The concepts

extracted manually are then subjected to the Protege tool, which will generate the

ontology according to the given information. A procedure similar to that used for the

automated domain ontology is used for constructing the domain expert’s ontology.

The major difference between the automated domain ontology and the domain

expert’s ontology is the association values, since the domain experts contain closely

related concepts, so associations are not integrated with the ontology. Let us consider

the ontology diagram shown in Figure 4.15 for the domain expert’s ontology

constructed by the Protege tool.

Figure 4.15: Protege tool: Domain expert’s ontology

60

Figure 4.15 shows a segment of the domain expert’s ontology constructed by the

Protege tool. The Protege tool took full advantage of the manually extracted concepts

and clearly differentiated the super nodes and sub nodes in the domain expert’s

ontology. The proposed approach considers more than one ontology, one which is the

automated ontology that is extracted from a set of e-learning documents using the

proposed data mining algorithms and the other is an ontology that is automated from

the existing BMI approach. The existing ontology is used for a comparison process

with the automated domain ontology. The main difference between the proposed

automated domain ontology with the existing ontology is the way the concept maps

are created. The database management domain is treated with these three ontologies

and their performances are evaluated based on recall, precision and F-measure.

Figure 4.16: A segment of OWL statements for the domain expert’s ontology

61

4.4.4 PERFORMANCE EVALUATION

This section includes the performance evaluation of the proposed approach

with different parameters listed in the above sections. Here, a comparison has been

done between the concept map obtained from automated domain ontology and

concept map developed by the existing system. The concept maps are analysed and

compared to find the effectiveness of our proposed approach over the existing system.

The experiments are conducted on the DBMS database as specified earlier.

The DBMS dataset is given to each of the ontologies as three different sets of

documents, first a set of 300 documents, then the size of documents are increased to a

set of 600 documents (which includes the first 300 documents together with a new set

of 300 documents) and the last set of 1000 documents (400 new documents are added

to previously available 600 documents) respectively. The documents in the dataset

contains the information regarding the database schema, users, architecture, indexing,

SQL, Query optimization, transaction Management, Recovery techniques. The topics

included in DBMS domain are discussed in Appendix D. The analyses are shown as

graphs in the following sections, and it can be seen that the proposed approach is

more data sensitive and efficient in finding the concepts maps. A series of analysis

have been done to evaluate the proposed ontology based on association rule mining

and sequential pattern mining algorithms

Table 4.5: Recall measures using different ontologies

No. of

Docs

Existing

ontology

Proposed ontology-

association rule

Proposed ontology-

Sequential pattern

300 0.38 0.44 0.51

600 0.54 0.76 0.76

1000 0.68 0.82 0.89

62

Figure 4.17: Plotting of concept recall measure of different ontologies

Figure 4.17 shows the evaluation of the ontologies based on the concept based recall

parameters. The concept recall values are obtained by comparing the existing

ontology and the automated domain ontology with the domain expert’s ontology. So,

based on the number of concepts in the domain expert’s ontology and the evaluating

ontologies, the recall values are calculated for the three sets of data. The evaluation

from the graph shows that, the automated domain ontology achieves higher recall

values when compared with the existing ontology. Table 4.5 shows that the peak

value obtained for the proposed automated domain ontology is 0.89 for the set of

1000 documents

Table 4.6: Precision measures using different ontologies

No. of

Docs

Existing

ontology

Proposed ontology-

association rule

Proposed ontology-

Sequential pattern

300 0.29 0.31 0.38

600 0.38 0.49 0.54

1000 0.49 0.52 0.58

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

300 600 1000

Co

nce

pt

Rec

all v

alu

es

Number of Documents

Existing ontology

Proposed ontology-Association Rule(FP-growth)

Proposed ontology-Sequentialpattern(PrefixSpan)

63

Figure 4.18: Concept precision mappings of different ontologies

Figure 4.18 shows the evaluation of the different ontologies on the basis of precision

parameter. The Precision is calculated by comparing the number of concepts in the

existing ontology and the automated domain ontology with the domain expert’s

ontology. The analysis from the graph showed that the proposed approach has higher

precision rate as compared to the existing ontology. Table 4.6 shows that the peak

precision rate obtained for the automated domain ontology is 0.58 for the dataset of

1000 documents.

Table 4.7: Association recall measures using different ontologies

No.

Docs

Existing

ontology

Proposed ontology-

association rule

Proposed ontology-

Sequential pattern

300 0.36 0.42 0.55

600 0.52 0.74 0.78

1000 0.64 0.84 0.88

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

300 600 1000

Co

nce

pt-

Pre

cisi

on

val

ues

Number of Documents

Existing ontology



64

Figure 4.19: Association recall mappings of different ontologies

Figure 4.19 shows the behaviour of the association recall values. The analysis shows

that the association recall is also no different from the concept recall parameters.

Table 4.7 shows that the peak association recall rate obtained for the automated

domain ontology is 0.88 for the dataset of 1000 documents.

Table 4.8: Association precision measures using different ontologies

No. of

Docs

Existing

ontology

Proposed ontology-

association rule

proposed ontology-

Sequential pattern

300 0.27 0.34 0.36

600 0.36 0.49 0.55

1000 0.45 0.54 0.58

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

300 600 1000

Ass

oci

atio

n R

ecal

l val

ues

Number of Documents

Existing ontology



65

Figure 4.20: Association precision mappings of different ontologies

Figure 4.20 shows the behaviour of association precision parameters for the three

different ontologies. The analysis from the graph shows that, among the three

ontologies, the ontology developed using the sequential pattern mining method has a

greater precision rate. Table 4.8 shows that the peak association recall rate obtained

for the automated domain ontology is 0.58 for the dataset of 1000 documents.

Table 4.9: Ontology recall measures using different ontologies

No.

Docs

existing

ontology

Proposed ontology-

association rule

proposed ontology-

Sequential pattern

300 0.37 0.43 0.5

600 0.53 0.71 0.75

1000 0.66 0.83 0.85

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

300 600 1000

Ass

oci

atio

n P

reci

sio

n v

alu

es

Number of Documents

Existing ontology



66

Figure 4.21: Ontology recall mappings of different ontologies

Figure 4.21 shows the ontology recall mappings for the different ontologies. The

ontology recall mappings are generated by considering concept recall values and

association recall values. Since the ontology recall values are associated with the

other recall values, there will not many differences in the impact of recall values. The

analysis from the graph clearly specifies it. Table 4.9 shows that the peak ontology

recall rate obtained for the automated domain ontology is 0.85 for the dataset of 1000

documents.

Table 4.10: Ontology precision measures using different ontologies

No. of

Docs

Existing

ontology

Proposed ontology-

association rule

Proposed ontology-

Sequential pattern

300 0.28 0.298 0.37

600 0.38 0.41 0.47

1000 0.40 0.48 0.52

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

300 600 1000

On

tolo

gy R

ecal

l val

ues

Number of Documents

Existing ontology



67

Figure 4.22: Ontology precision mappings of different ontologies

Figure 4.22, the precision of different ontologies is plotted according to their

performance on DBMS database. Similar to the ontology recall, the ontology

precision is also associated with the concept precision and association precision. The

analysis from the graph indicates that, the sequential pattern powered ontology has

more precision rate as compared to the other two ontologies. Table 4.10 shows that

the peak ontology precision rate obtained for the automated domain ontology is 0.52

for the dataset of 1000 documents.

Table 4.11: Ontology F-measures using different ontologies

No. of

Docs

Existing

ontology

Proposed ontology-

association rule

Proposed ontology-

Sequential pattern

300 0.32 0.35 0.43

600 0.44 0.52 0.58

1000 0.50 0.60 0.65

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

300 600 1000

On

tolo

gy P

reci

sio

n v

alu

es

Number of Documents

Existing ontology



68

Figure 4.23: Ontology F-measure mappings of different ontologies

Figure 4.23 depicts the ontology F-measure values for the three different ontologies

processed with the DBMS dataset. The F-measure values are calculated based on the

ontology recall values and ontology precision values. The analysis from the graph

shows that, the F-measure rate is high for the ontology based on the sequential pattern

mining. Both the proposed ontologies are up a head of the existing ontology based on

the performance evaluation. Table 4.11 shows that the peak ontology precision rate

obtained for the automated domain ontology is 0.65 for the dataset of 1000

documents.

4.5 CONCLUSIONS

The techniques for constructing domain ontology automatically using different

techniques such as association rule mining and sequential mining algorithms have

been implemented. A set of 1000 documents have been processed and the concepts

were extracted. Based on the concepts a concept map is created for the DBMS

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

300 600 1000

On

to F

-mea

sure

val

ues

Number of Documents

Existing ontology



69

domain. The most attractive feature of the proposed method is that the student

interaction GUI (Graphical User Interface), which evaluates the student’s capability in

learning a specific topic. From the initial assessment, the system generated an e-

learning content adaptively for the student with the aid of automated domain

ontology. The performance of the proposed system in automated ontology

construction was evaluated with the help of precision, recall and f-measure. The peak

rates obtained for different evaluation parameters are, 0.85 for ontology recall, 0.52

for ontology precision and 0.65 for ontology f-measure. The application of automated

domain ontology was discussed through the help of a case study.