ontology development using hozo and semantic

Ontology Development using Hozo and Semantic

Analysis for

Information Retrieval in Semantic Web Gagandeep Singh

1, Vishal Jain

2 and Dr. Mayank Singh

3

1B.Tech, Guru Tegh Bahadur Institute of Technology, GGS Indraprastha University, Delhi

2Research Scholar, Computer Science and Engineering Department, Lingaya‟s University, Faridabad

3Associate Professor, Computer Science and Engineering Department, Krishna Engineering College, Ghaziabad, U.P

[email protected],

[email protected],

[email protected]

Abstract- We are living in the world of computers.

This modern era deals with a wide network of

information present on web. A huge number of

documents present on web have increased the need for

support in exchange of information and knowledge. It

is necessary that user should be provided with relevant

information about given domain. Traditional

Information Extraction techniques like Knowledge

Management Solutions were not so advanced that they

can lead to extraction of precise information form text

documents. It leads to the concept of Semantic Web

that depends on creation and integration of Semantic

data. The Semantic data in turn depends on building of

Ontology. Ontology is considered as backbone of

Software system. It improves understanding between

concepts used in Semantic Web. So, there is need to

build an ontology that uses well defined methodology

and process of developing ontology is called Ontology

Development.

Keywords - Information Retrieval (IR), Ontology,

Semantic Web (SW), Software Development Life Cycle (SDLC),

Hozo

I. INTRODUCTION

Information Retrieval (IR) technology is major factor

responsible for handling annotations in Semantic Web

(SW) [1]. Traditional text Search Engines are not optimal

for finding the relevant documents. It is produced by

various approaches of ontologies and Semantic data.

These purely text based Search Engines fails because of

following reasons:

Improper style of natural languages: - There are

chances that syntax of languages is not appropriate.

High level unclear concepts: - Some concepts which

are used in document but present Search Engines

can‟t find those words.

Timely Scenario: - Keywords matching is not used to

find timely specified documents.

The ability to translate knowledge from different

languages is considered as major factor for building

powerful Artificial Intelligent (AI) systems. Various AI

research communities like Natural Language Processing

(NLP). Ontology has changed the way of present web thus

making it more expressive and full of Knowledgeable

Representation documents. This paper is divided into five

sections:

Section 2 defines the IR technology. It describes IR

Process and its Architecture, types of documents present

on web. Section 3 gives brief overview of Semantic Web

(SW) including its challenges, its technologies and it‟s

comparison with World Wide Web (www). It gives a

proposed methodology for building ontology with the help

of Ontology Editors that makes use of Knowledgeable

Representation languages like OWL, RDF, DAML+OIL

etc. In Section 4, we have described about one of Ontology

Editors named as Hozo. We have developed ontology on

“Computer Appreciation” using Hozo. Section 5 gives

information about Semantic Analysis in Ontology based

information retrieval and search system. We have also

implemented one of semantic search engines named as

SenseBot in respective paper.

II. INFORMATION RETRIEVAL

Definition: - Information Retrieval (IR) is defined as

process of identifying and retrieving unstructured

documents containing the specific information stored in

them. IR mainly focuses on retrieval of natural language

text.

A. Types of Documents

Documents may be Structured, Unstructured, Semi-

structured or combination of them.

(a) Structured documents: A document is said to be

structured if it is written in well defined syntax and has

components. Structured database is a Table where we have

multiple attributes of user‟s record. It is shown below:

TABLE1. STRUCTURED DATABASE

IR engines can easily find out components in structured

document due to its unique components.

(b) Unstructured documents: These documents are

written in natural languages. They do not have well

defined syntaxes and positions where IR engines could

find records satisfying user problems. Unstructured

documents are randomly generated documents on any

topic.

S.No Name Address Id

1. Gagan Canada 129

2. Vishal USA 128

Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013)

978-1-4673-6101-9/13/$31.00 ©2013 IEEE 113

(c) Semi-structured documents: These documents share

common structure and meaning of collection of textual

documents. It is different from structured query in a way

that they do not have same column for each row in table.

B. IR Process and Architecture

The procedure of retrieving information is as follows:

Background knowledge is stored in form of ontology that

can be used at any step. As we have ranked list of

documents, they are indexed to form document in

represented way. These documents produce ranked results

which are given to admin. Admin solves user query which

leads to transformation of user query.

Figure1: Information Retrieval Process [2].

The architecture of Information Retrieval Engine is as

follows:

It is based on ONTOLOGY BASED MODEL which

represents the content of resource from given ontology. It

has following parts:

OMC (Ontology Manager Component):- It is used by

Indexer, Search Engine and GUI.

INDEXER: - It indexes documents and creates

metadata.

SEARCH ENGINE

GUI supports user in query formation

Figure2: IR Architecture

III. CONCEPT OF SEMANTIC WEB AND

ONTOLOGY

The idea of Semantic Web (SW) [3] as envisioned by Tim

Bermers Lee came into existence in 1996 with the aim to

translate given information into machine understandable

form.

The Semantic Web (SW) is an extension of current www

in which documents are filled by annotations in machine

understandable markup language. Semantic Web (SW)

uses Semantic Web documents (SWD‟s) that are written in

SW languages like OWL, DAML+OIL.

A. Challenges and Aspects of Semantic Web (SW)

In spite of various efforts led by researchers, SW has

remained a future concept or technology due to following

reasons:

Complete SW parts have not been yet developed and

the developed parts are so poor that they cannot be

used in real world.

No optimal Software or Hardware is provided.

Following are aspects of Semantic Web (SW):

The Semantic Web (SW) leads to an environment

where information and services can be interpreted

semantically and are processed in machine

understandable form.

SW relies on ontology as a tool for modeling an

abstract view of real world and Semantic analysis of

documents.

SW is an XML (Extensible Markup Language)

application.

B. Semantic Web (SW) vs. World Wide Web (www)

Both Semantic Web (SW) and World Wide Web (www)

are different from each other in various aspects which are

described in the form of table as shown:

TABLE2. SEMANTIC WEB VS WORLD WIDE WEB

Semantic Web (SW) World Wide Web (www) 1. It is an extension of

www that will manipulate

contents of information

automatically without

human involvement.

2. It discovers documents

for gathering relevant

information.

3. It deals with resources

like pages, images, photos

and people.

4. SW holds different kinds

of relations showing

association among different

kinds of resources.

5. SW makes use of

ontology that allows users

to organize information

into science of concepts.

6. SW has formal

semantics of context i.e. it

uses web ontology

languages for generating

data.

7. Complete information is

accessible to Semantic

Search Engines like Hakia.

1. It is human focused web.

2. It discovers documents for

people.

3. It only deals with media

resources like web pages, photos,

images.

4. Www only holds of hyperlinks

between resources.

5. It does not use concept of

ontologies.

6. It does not have formal

semantics of context. The

contents are machine readable but

not machine understandable.

7. Only few pages of information

are accessible to traditional

Search Engines like Google.

C. Semantic Web (SW) Technology

SW technologies are listed below:

XML: - XML is extensible language that allows

users to create their own tags to documents. It

provides syntax for content structure within

documents.

XML Schema: - It is language for defining XML

documents. XML document is a tree.

RDF: - It stands for Resource Description

Framework. It is simple language to express data

models which refers to objects and their

Ranked

• list of

•documents

Text documents Result Admin

Solves user

query

INDEXER GUI

SEARCH ENGINE OMC


114

relationships. These models are called RDF

Models.

RDF model consists of Resource, Property and

Statement. Resource may be web pages or individual

elements of XML document. Resource having its name is

called as Property. Statement is combination of Resource

and Property along with its value.

E.g. Vishal plays Guitar.

Object Property Resource

Figure3: RDF Model

D. Ontology

The term Ontology [4] can be defined in different ways as:

Ontology is abbreviated as FESC (Formal, Explicit, and

Specification of Shared Conceptualization) which is

defined as:

Formal: - It specifies that it should be machine

understandable.

Explicit: - It defines type of constraints used in model.

Shared: - It means that ontology is shared by group. It is

not restricted to individuals.

Conceptualization: - It refers model of some phenomenon

to identify relevant concepts of that phenomenon.

Ontology is also defined as set of concepts and

relationships arranged in hierarchical fashion.

E. Ontology Development

Ontology development [5] needs well defined

methodology that must follow certain guidelines:

Ontology being developed should follow

Software Engineering standards.

Ontology development strategy should be simple

and practical.

The phases that are being used in developing ontology also

satisfy Software Engineering principles and thus called as

Software Development Life Cycle (SDLC) phases. They

are described below:

(a) Specification Phase: - This phase has its few activities.

Domain Vocabulary definition: - It defines

common name and attributes for domain

concepts.

Identifying Resources: - A Resource is anything

that has URI. So, if some concepts have number

of instances, then they can be grouped into a

class.

Identifying Axioms: - They are structures that

represent behavior of concepts.

Identifying relationships: - Relations are defined

within resources.

Identifying data characteristics: - Defines features

of types of resources and their relationship.

Applying constraints: - Constraints represent

named relationships between domain and range

class.

Verification: - After designing preliminary web

ontology model, it is necessary that it should be

tested for its correctness.

(b) Design Phase: - The phase is backbone of Semantic

Web. The physical structure of designed ontology is based

on RDF model which is associated with three triples-

Subject, Predicate and Object.

Predicate: - All characteristics of resources and

relationship are taken as Predicate.

E.g. each student is assigned unique RollNo called as

„HasRollNo‟.

Subject: - All domain classes of characteristics and

relationships of resources are taken as Subject.

E.g. there are various average students each having unique

URI, so they are grouped in ‘AvgStudentsGroup’.

Objects: - Refers to Range class relationships.

E.g. HasRollNo contains range class „NUMBER‟ which is

literal.

(c) Formalization Phase: - This phase is result of output of

ontology obtained in design phase with the help of some

tools.

Figure4: Ontology development phases [6]

IV. HOZO- AN ONTOLOGY EDITOR

Version used: - 5.2.36 beta

Developed at Mizoguchi Lab, ENEGATE Co. Ltd.

Hozo is different from other ontology editors in following

aspects:

Its user friendly environment lets users to work

easily on it.

Hozo has API named as HozoAPI ver 1.15 that

accesses existing ontologies.

Slot definition option is available.

Inheritance information is clear and easily

accessible by two options: One is from Super

Design Phase

•Formalization Phase

Specification Phase

Domain vocabulary definition

Resource Identification

Identifying Axioms

Identifying Relationships

Identifying data characteristics

Applying constraints

Verification

Resource Property Statement

RDF


115

Classes through is-a link. Other is from Class

constraint.

Hozo provides facility of correcting errors at time

of validating ontology.

Figure5: Hozo Components [7]

.

Figure6: Ontology editing screen

4.1 Case Study

We have presented a case study to implement all the

phases that are involved in ontology development

methodology. This case study illustrates the ontology

building on „Computer Appreciation‟ with the help of

ontology editor called HOZO.

(a) Specification Phase

Domain definition and resources identification: -

There are number of computers classified on

basis of speed, each has its unique URI so they

are grouped in sub class called

„CLASSIFICATION‟ under Super class

„COMPUTER‟.

TABLE3. DEFINING CLASSES AND INSTANCES

Concepts

(Nodes)

Instances Features of

Resources Predicate

Classification Home

computer,

PC etc.

Purpose, Name,

examples. Hasname etc.

Generation First,

Second etc.

Purpose, Name,

examples.

Hasname etc.

Components H/w

system, s/w system

Types, name,

examples, purpose.

Hastype

Hasname etc.

Input devices Scanner,

CPU, keyboard

Types, purpose Hastype and

purpose.

Output devices Monitor,

Printer, speakers.

Types,

examples, purpose.

Hastype and

examples.

Defining Axioms about Resources: - A computer

has Hardware system and Software system.

Hardware system is categorized into Input

devices and Output devices..

Relationship Identifying and naming: - Relation

between „H/w system‟ class and „Input devices‟

class is named as Haskeyboard.

Identifying data characteristics: - TABLE4. DATA CHARACTERISTICS

Name Domain class Range class

HasName Computer Generation, Components, Input and

Output devices

String

Hastype Generation Components

String

Hasyear Generation Number

Applying Constraints: -

TABLE5. RESOURCES RELATIONS ALONG WITH CONSTRAINTS

Name Domain class Range class

Has H/w system Computer Components

Has S/w system Computer Components

Haskeyboard H/w system Input devices

HasPrinter H/w system Output devices

HasRing Computer Network System

HasBus Computer Network System

Validating: - In Hozo, there is a feature named

Ontology Consistency Check feature that utilizes

Hozo inference structure to verify whether

ontology is developed properly or not.

(b) Design Phase

In context of Hozo, the output obtained from specification

phase results in an ontology file that is considered as

output of developed ontology. It is available in different

formats like:

Text/HTML

XML

RDF

OWL

Figure7: Sample slice of ontology using RDF

Figure8: Sample slice of ontology using OWL

(c) Formalization Phase

This phase describes developed ontology pictorially whose

source code was developed using RDF syntax.

Ontology Editor

Ontology Manager

(Dependencies)

Ontology Server

Developer


116

It shows Hozo user – interface for showing ontology

hierarchy.

Figure9.1: Hozo user-interface

Figure9.2: Hozo user – interface

Another interesting feature of Hozo is that it produces map

layout view of our developed ontology by using „Generate

Map‟ function.

Figure10: Map Generation using Hozo

V. SEMANTIC ANALYSIS

The word Semantic Association Analysis means

discovering complex and meaningful relationships

between objects and these relationships are called as

Semantic Associations. Following are aspects about

Semantic analysis as:

It leads to generation of knowledge driven

information from available data resources.

It uses semantic query framework for analyzing

relationships by using various semantic query

languages like SPARQL, RQL, and SERQL etc.

There are semantic search engines that analyses

relationships and creates associations between

resources. Examples include Swoogle, Weet-IT.

A. Components of Semantic Analysis

(a) Ontology development: - The process of developing

ontology has became easier with the help of free, open

source editors like Hozo that uses ontology languages like

DAML+OIL, RDF etc.

E.g. If we want to create ontology on travelling process,

then we can import the concepts from existing ontology.

We need not to develop from root node.

(b) Dataset Construction: - Dataset is also called as Test

Bed or Knowledge Base. It is collection of instances for

creating ontology.

(c) Semantic Association Discovery: - It uses Graph

Traversal algorithm for determining semantic associations

where we have to search all possible paths between any

two nodes in semantic graph.

Finding associations between all possible paths in a graph

is made possible by using path association algorithm.

Steps are as follows:

Finding possible paths between two classes at schema

level

Comparing each path with other paths

If there is intersection between two nodes

Two paths meet at same node in schema

Result is used to perform search at data level that

determines associations between nodes.

(d) Displaying Results: - It refers how semantic

associations are being displayed.

Ontology

Data conversion Data sources Data Sets

Figure11: Components of Semantic enhanced ontology based

search engine

B. Semantic Search Engine Structure

The system uses a search engine called SenseBot that is

designed to produce summary in response to keywords

that are to be searched by user.

About SenseBot: - It understands the meaning of search

query and uses relevant results to generate the summary of

valid results.

Below figure illustrates the results of query from Semantic

association analysis.

Figure12: Results of Query

CONCLUSION

This paper highlights the common problem of users of

retrieving relevant information about their queries. It

emphasis on the concept of Information Retrieval (IR) and

various IR approaches for extracting knowledge driven

documents from the cluster of interlinked web documents.

Semantic search

engine uses

languages like

SPARQL, RQL

Ranking algorithm

Provides user

interface


117

Here introduces the concept of ontology and their role in

Semantic Web. Since, ontology is considered as backbone

of software system, so it should be well designed without

creating any ambiguities.

This paper also shows proposed methodology for ontology

development using SDLC phases. The concept of

Semantic Web has revolutionized emerging technology by

extracting information from various web documents and

integrating them in machine form. We have developed

ontology on Computer Appreciation using one of ontology

editors named Hozo.

In this paper, research issues in Semantic analysis have

been described that plays vital role in Semantic web. It

enables meaningful relations between set of entities and

finds all possible paths by using Graph traversal algorithm.

It also describes architecture of Semantic enhanced

ontology based search engine.

REFERENCES

[1]. Urvi Shah, James Mayfield, “Information Retrieval on the Semantic

Web”, ACM CIKM International Conference on Information

Management, Nov 2002.

[2]. Gagandeep Singh, Vishal Jain, “Information Retrieval (IR) through

Semantic Web (SW): An Overview”, In proceedings of

CONFLUENCE-The Next Generation Information Technology

Summit, 27-28 September 2012, pp 23-27.

[3]. Accessible from T. Berners Lee, “The Semantic Web”, Scientific

American, May 2007

[4]. Berners Lee, J. Lassila, “Ontologies in Semantic Web”, “Scientific

American”, May 2001, pp 34-43.

[5]. Helena Sofia Pinto, Joao P. Martins, “Ontologies: How can they be

built? Knowledge and Information Systems, pages 441-464, 2004.

[6]. Amjad Farooq and Abad Shah, “Ontology Development

Methodology for Semantic Web System”, Pakistan Journal of Life

Social Sciences, Vol.6 No.1, May 2008, pp 50-58.

[7]. Kozaki K, “Hozo: An Environment for Building Ontologies”, In

Proceedings of the 13th International Conference on Knowledge

Engineering and Knowledge Management (EKAW), pp 213-218,

October 2002.

[8]. Deligiannidis L, Sheth A, “Semantic Analytics Visualization”,

Intelligence and Security Informatics Proc. ISI-2006, pp 48-59,

2006.

[9]. J. Mayfield, “Ontologies and text retrieval”, Knowledge

Engineering Review, 2007

[10]. Cristani, R. Cuel, “A Survey on Ontology Creation

Methodologies”, International Journal on Semantic Web and

Information Systems, Vol.1 No.2”, 2005.

[11]. Uschold, M. And King, “Towards A Methodology for Building

Ontologies”, IJCAI-95 Workshop on Basic Ontological Issues in

Knowledge Sharing, Montreal and Canada, 2006.

[12]. Uschold, M. And Gr Ninger, “Ontologies: Principles, Methods and

Applications”, Knowledge Engineering Review, Vol.11 No.2, pp

93-137.

[13]. Updegrove A, “The Semantic Web: An interview with Tim

Berners-Lee”, 2005.

[14]. S. Staab, R. Studer and Y. Sure, “Knowledge Processes and

Ontologies”, IEEE Intelligent Systems Vol. 16, No.1, pp 2-9, 2001

[15]. Kozaki K, R. Mizoguchi, “An Environment for Distributed

Ontology Development Based on Dependency Management”, In

proceedings of the 2nd International Semantic Web Conference

(ISWC), pp 453-468, 2003.

[16]. L. Stojanovic, “Migrating data intensive web sites into the Semantic

web”, In Proceedings of the 17th ACM symposium on applied

computing (SAC), ACM Press, pp 1100-1107, 2002.

[17]. Aleman-Meza B, Arpinar I.B, “A Context aware Semantic

association Ranking”, Technical Report LSDIS Lab, Computer

Science, Univ of Georgia, pp 03-010, 2003.

[18]. Dayal U, Kuno H, “Making the Semantic Web Real”, IEEE Data

Engineering Bulletin, Vol.26, No.4, pp 4-7, 2003.

[19]. Kaushal Giri, “Role of Ontology in Semantic Web”, DESIDOC

Journal of Library & Information Technology, Vol.31 No.2, March

2011, pp 116-120

[20]. Urvi Shah, Tim Finin and Anupam Joshi, “Information Retrieval on

the Semantic Web”, Scientific American, pp 35-45

About the Authors

Gagandeep Singh Narula has

completed B.Tech from Guru Tegh

Bahadur Institute of Technology

(GTBIT) affiliated to Guru Gobind

Singh Indraprastha University

(GGSIPU), New Delhi. His research

area includes Web Technology,

Semantic Web and Information Retrieval.

Vishal Jain has completed his

M.Tech (CSE) from USIT, Guru

Gobind Singh Indraprastha

University, Delhi and doing PhD in

Computer Science and Engineering

Department, Lingaya‟s University,

Faridabad. Presently, He is working

as Assistant Professor in Bharati Vidyapeeth‟s Institute of

Computer Applications and Management, (BVICAM),

New Delhi. His research area includes Web Technology,

Semantic Web and Information Retrieval. He is also

associated with CSI, ISTE.

Dr. Mayank Singh have done his

M. E in software engineering from

Thapar University and PhD from

Uttarakhand Technical University.

His Research areas are Software

Engineering, Software Testing,

Wireless Sensor Networks and Data

Mining. Presently He is working as

Associate Professor in Krishna Engineering College,

Ghaziabad. He is associated with CSI, IE (I), IEEE

Computer Society India and ACM.


118