ontology development using hozo and semantic
DESCRIPTION
Filename: Ontology Development using Hozo and Semantic.pdfTRANSCRIPT
Ontology Development using Hozo and Semantic
Analysis for
Information Retrieval in Semantic Web Gagandeep Singh
1, Vishal Jain
2 and Dr. Mayank Singh
3
1B.Tech, Guru Tegh Bahadur Institute of Technology, GGS Indraprastha University, Delhi
2Research Scholar, Computer Science and Engineering Department, Lingaya‟s University, Faridabad
3Associate Professor, Computer Science and Engineering Department, Krishna Engineering College, Ghaziabad, U.P
Abstract- We are living in the world of computers.
This modern era deals with a wide network of
information present on web. A huge number of
documents present on web have increased the need for
support in exchange of information and knowledge. It
is necessary that user should be provided with relevant
information about given domain. Traditional
Information Extraction techniques like Knowledge
Management Solutions were not so advanced that they
can lead to extraction of precise information form text
documents. It leads to the concept of Semantic Web
that depends on creation and integration of Semantic
data. The Semantic data in turn depends on building of
Ontology. Ontology is considered as backbone of
Software system. It improves understanding between
concepts used in Semantic Web. So, there is need to
build an ontology that uses well defined methodology
and process of developing ontology is called Ontology
Development.
Keywords - Information Retrieval (IR), Ontology,
Semantic Web (SW), Software Development Life Cycle (SDLC),
Hozo
I. INTRODUCTION
Information Retrieval (IR) technology is major factor
responsible for handling annotations in Semantic Web
(SW) [1]. Traditional text Search Engines are not optimal
for finding the relevant documents. It is produced by
various approaches of ontologies and Semantic data.
These purely text based Search Engines fails because of
following reasons:
Improper style of natural languages: - There are
chances that syntax of languages is not appropriate.
High level unclear concepts: - Some concepts which
are used in document but present Search Engines
can‟t find those words.
Timely Scenario: - Keywords matching is not used to
find timely specified documents.
The ability to translate knowledge from different
languages is considered as major factor for building
powerful Artificial Intelligent (AI) systems. Various AI
research communities like Natural Language Processing
(NLP). Ontology has changed the way of present web thus
making it more expressive and full of Knowledgeable
Representation documents. This paper is divided into five
sections:
Section 2 defines the IR technology. It describes IR
Process and its Architecture, types of documents present
on web. Section 3 gives brief overview of Semantic Web
(SW) including its challenges, its technologies and it‟s
comparison with World Wide Web (www). It gives a
proposed methodology for building ontology with the help
of Ontology Editors that makes use of Knowledgeable
Representation languages like OWL, RDF, DAML+OIL
etc. In Section 4, we have described about one of Ontology
Editors named as Hozo. We have developed ontology on
“Computer Appreciation” using Hozo. Section 5 gives
information about Semantic Analysis in Ontology based
information retrieval and search system. We have also
implemented one of semantic search engines named as
SenseBot in respective paper.
II. INFORMATION RETRIEVAL
Definition: - Information Retrieval (IR) is defined as
process of identifying and retrieving unstructured
documents containing the specific information stored in
them. IR mainly focuses on retrieval of natural language
text.
A. Types of Documents
Documents may be Structured, Unstructured, Semi-
structured or combination of them.
(a) Structured documents: A document is said to be
structured if it is written in well defined syntax and has
components. Structured database is a Table where we have
multiple attributes of user‟s record. It is shown below:
TABLE1. STRUCTURED DATABASE
IR engines can easily find out components in structured
document due to its unique components.
(b) Unstructured documents: These documents are
written in natural languages. They do not have well
defined syntaxes and positions where IR engines could
find records satisfying user problems. Unstructured
documents are randomly generated documents on any
topic.
S.No Name Address Id
1. Gagan Canada 129
2. Vishal USA 128
Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013)
978-1-4673-6101-9/13/$31.00 ©2013 IEEE 113
(c) Semi-structured documents: These documents share
common structure and meaning of collection of textual
documents. It is different from structured query in a way
that they do not have same column for each row in table.
B. IR Process and Architecture
The procedure of retrieving information is as follows:
Background knowledge is stored in form of ontology that
can be used at any step. As we have ranked list of
documents, they are indexed to form document in
represented way. These documents produce ranked results
which are given to admin. Admin solves user query which
leads to transformation of user query.
Figure1: Information Retrieval Process [2].
The architecture of Information Retrieval Engine is as
follows:
It is based on ONTOLOGY BASED MODEL which
represents the content of resource from given ontology. It
has following parts:
OMC (Ontology Manager Component):- It is used by
Indexer, Search Engine and GUI.
INDEXER: - It indexes documents and creates
metadata.
SEARCH ENGINE
GUI supports user in query formation
Figure2: IR Architecture
III. CONCEPT OF SEMANTIC WEB AND
ONTOLOGY
The idea of Semantic Web (SW) [3] as envisioned by Tim
Bermers Lee came into existence in 1996 with the aim to
translate given information into machine understandable
form.
The Semantic Web (SW) is an extension of current www
in which documents are filled by annotations in machine
understandable markup language. Semantic Web (SW)
uses Semantic Web documents (SWD‟s) that are written in
SW languages like OWL, DAML+OIL.
A. Challenges and Aspects of Semantic Web (SW)
In spite of various efforts led by researchers, SW has
remained a future concept or technology due to following
reasons:
Complete SW parts have not been yet developed and
the developed parts are so poor that they cannot be
used in real world.
No optimal Software or Hardware is provided.
Following are aspects of Semantic Web (SW):
The Semantic Web (SW) leads to an environment
where information and services can be interpreted
semantically and are processed in machine
understandable form.
SW relies on ontology as a tool for modeling an
abstract view of real world and Semantic analysis of
documents.
SW is an XML (Extensible Markup Language)
application.
B. Semantic Web (SW) vs. World Wide Web (www)
Both Semantic Web (SW) and World Wide Web (www)
are different from each other in various aspects which are
described in the form of table as shown:
TABLE2. SEMANTIC WEB VS WORLD WIDE WEB
Semantic Web (SW) World Wide Web (www) 1. It is an extension of
www that will manipulate
contents of information
automatically without
human involvement.
2. It discovers documents
for gathering relevant
information.
3. It deals with resources
like pages, images, photos
and people.
4. SW holds different kinds
of relations showing
association among different
kinds of resources.
5. SW makes use of
ontology that allows users
to organize information
into science of concepts.
6. SW has formal
semantics of context i.e. it
uses web ontology
languages for generating
data.
7. Complete information is
accessible to Semantic
Search Engines like Hakia.
1. It is human focused web.
2. It discovers documents for
people.
3. It only deals with media
resources like web pages, photos,
images.
4. Www only holds of hyperlinks
between resources.
5. It does not use concept of
ontologies.
6. It does not have formal
semantics of context. The
contents are machine readable but
not machine understandable.
7. Only few pages of information
are accessible to traditional
Search Engines like Google.
C. Semantic Web (SW) Technology
SW technologies are listed below:
XML: - XML is extensible language that allows
users to create their own tags to documents. It
provides syntax for content structure within
documents.
XML Schema: - It is language for defining XML
documents. XML document is a tree.
RDF: - It stands for Resource Description
Framework. It is simple language to express data
models which refers to objects and their
Ranked
• list of
•documents
Text documents Result Admin
Solves user
query
INDEXER GUI
SEARCH ENGINE OMC
Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013)
114
relationships. These models are called RDF
Models.
RDF model consists of Resource, Property and
Statement. Resource may be web pages or individual
elements of XML document. Resource having its name is
called as Property. Statement is combination of Resource
and Property along with its value.
E.g. Vishal plays Guitar.
Object Property Resource
Figure3: RDF Model
D. Ontology
The term Ontology [4] can be defined in different ways as:
Ontology is abbreviated as FESC (Formal, Explicit, and
Specification of Shared Conceptualization) which is
defined as:
Formal: - It specifies that it should be machine
understandable.
Explicit: - It defines type of constraints used in model.
Shared: - It means that ontology is shared by group. It is
not restricted to individuals.
Conceptualization: - It refers model of some phenomenon
to identify relevant concepts of that phenomenon.
Ontology is also defined as set of concepts and
relationships arranged in hierarchical fashion.
E. Ontology Development
Ontology development [5] needs well defined
methodology that must follow certain guidelines:
Ontology being developed should follow
Software Engineering standards.
Ontology development strategy should be simple
and practical.
The phases that are being used in developing ontology also
satisfy Software Engineering principles and thus called as
Software Development Life Cycle (SDLC) phases. They
are described below:
(a) Specification Phase: - This phase has its few activities.
Domain Vocabulary definition: - It defines
common name and attributes for domain
concepts.
Identifying Resources: - A Resource is anything
that has URI. So, if some concepts have number
of instances, then they can be grouped into a
class.
Identifying Axioms: - They are structures that
represent behavior of concepts.
Identifying relationships: - Relations are defined
within resources.
Identifying data characteristics: - Defines features
of types of resources and their relationship.
Applying constraints: - Constraints represent
named relationships between domain and range
class.
Verification: - After designing preliminary web
ontology model, it is necessary that it should be
tested for its correctness.
(b) Design Phase: - The phase is backbone of Semantic
Web. The physical structure of designed ontology is based
on RDF model which is associated with three triples-
Subject, Predicate and Object.
Predicate: - All characteristics of resources and
relationship are taken as Predicate.
E.g. each student is assigned unique RollNo called as
„HasRollNo‟.
Subject: - All domain classes of characteristics and
relationships of resources are taken as Subject.
E.g. there are various average students each having unique
URI, so they are grouped in ‘AvgStudentsGroup’.
Objects: - Refers to Range class relationships.
E.g. HasRollNo contains range class „NUMBER‟ which is
literal.
(c) Formalization Phase: - This phase is result of output of
ontology obtained in design phase with the help of some
tools.
Figure4: Ontology development phases [6]
IV. HOZO- AN ONTOLOGY EDITOR
Version used: - 5.2.36 beta
Developed at Mizoguchi Lab, ENEGATE Co. Ltd.
Hozo is different from other ontology editors in following
aspects:
Its user friendly environment lets users to work
easily on it.
Hozo has API named as HozoAPI ver 1.15 that
accesses existing ontologies.
Slot definition option is available.
Inheritance information is clear and easily
accessible by two options: One is from Super
Design Phase
•Formalization Phase
Specification Phase
Domain vocabulary definition
Resource Identification
Identifying Axioms
Identifying Relationships
Identifying data characteristics
Applying constraints
Verification
Resource Property Statement
RDF
Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013)
115
Classes through is-a link. Other is from Class
constraint.
Hozo provides facility of correcting errors at time
of validating ontology.
Figure5: Hozo Components [7]
.
Figure6: Ontology editing screen
4.1 Case Study
We have presented a case study to implement all the
phases that are involved in ontology development
methodology. This case study illustrates the ontology
building on „Computer Appreciation‟ with the help of
ontology editor called HOZO.
(a) Specification Phase
Domain definition and resources identification: -
There are number of computers classified on
basis of speed, each has its unique URI so they
are grouped in sub class called
„CLASSIFICATION‟ under Super class
„COMPUTER‟.
TABLE3. DEFINING CLASSES AND INSTANCES
Concepts
(Nodes)
Instances Features of
Resources Predicate
Classification Home
computer,
PC etc.
Purpose, Name,
examples. Hasname etc.
Generation First,
Second etc.
Purpose, Name,
examples.
Hasname etc.
Components H/w
system, s/w system
Types, name,
examples, purpose.
Hastype
Hasname etc.
Input devices Scanner,
CPU, keyboard
Types, purpose Hastype and
purpose.
Output devices Monitor,
Printer, speakers.
Types,
examples, purpose.
Hastype and
examples.
Defining Axioms about Resources: - A computer
has Hardware system and Software system.
Hardware system is categorized into Input
devices and Output devices..
Relationship Identifying and naming: - Relation
between „H/w system‟ class and „Input devices‟
class is named as Haskeyboard.
Identifying data characteristics: - TABLE4. DATA CHARACTERISTICS
Name Domain class Range class
HasName Computer Generation, Components, Input and
Output devices
String
Hastype Generation Components
String
Hasyear Generation Number
Applying Constraints: -
TABLE5. RESOURCES RELATIONS ALONG WITH CONSTRAINTS
Name Domain class Range class
Has H/w system Computer Components
Has S/w system Computer Components
Haskeyboard H/w system Input devices
HasPrinter H/w system Output devices
HasRing Computer Network System
HasBus Computer Network System
Validating: - In Hozo, there is a feature named
Ontology Consistency Check feature that utilizes
Hozo inference structure to verify whether
ontology is developed properly or not.
(b) Design Phase
In context of Hozo, the output obtained from specification
phase results in an ontology file that is considered as
output of developed ontology. It is available in different
formats like:
Text/HTML
XML
RDF
OWL
Figure7: Sample slice of ontology using RDF
Figure8: Sample slice of ontology using OWL
(c) Formalization Phase
This phase describes developed ontology pictorially whose
source code was developed using RDF syntax.
Ontology Editor
Ontology Manager
(Dependencies)
Ontology Server
Developer
Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013)
116
It shows Hozo user – interface for showing ontology
hierarchy.
Figure9.1: Hozo user-interface
Figure9.2: Hozo user – interface
Another interesting feature of Hozo is that it produces map
layout view of our developed ontology by using „Generate
Map‟ function.
Figure10: Map Generation using Hozo
V. SEMANTIC ANALYSIS
The word Semantic Association Analysis means
discovering complex and meaningful relationships
between objects and these relationships are called as
Semantic Associations. Following are aspects about
Semantic analysis as:
It leads to generation of knowledge driven
information from available data resources.
It uses semantic query framework for analyzing
relationships by using various semantic query
languages like SPARQL, RQL, and SERQL etc.
There are semantic search engines that analyses
relationships and creates associations between
resources. Examples include Swoogle, Weet-IT.
A. Components of Semantic Analysis
(a) Ontology development: - The process of developing
ontology has became easier with the help of free, open
source editors like Hozo that uses ontology languages like
DAML+OIL, RDF etc.
E.g. If we want to create ontology on travelling process,
then we can import the concepts from existing ontology.
We need not to develop from root node.
(b) Dataset Construction: - Dataset is also called as Test
Bed or Knowledge Base. It is collection of instances for
creating ontology.
(c) Semantic Association Discovery: - It uses Graph
Traversal algorithm for determining semantic associations
where we have to search all possible paths between any
two nodes in semantic graph.
Finding associations between all possible paths in a graph
is made possible by using path association algorithm.
Steps are as follows:
Finding possible paths between two classes at schema
level
Comparing each path with other paths
If there is intersection between two nodes
Two paths meet at same node in schema
Result is used to perform search at data level that
determines associations between nodes.
(d) Displaying Results: - It refers how semantic
associations are being displayed.
Ontology
Data conversion Data sources Data Sets
Figure11: Components of Semantic enhanced ontology based
search engine
B. Semantic Search Engine Structure
The system uses a search engine called SenseBot that is
designed to produce summary in response to keywords
that are to be searched by user.
About SenseBot: - It understands the meaning of search
query and uses relevant results to generate the summary of
valid results.
Below figure illustrates the results of query from Semantic
association analysis.
Figure12: Results of Query
CONCLUSION
This paper highlights the common problem of users of
retrieving relevant information about their queries. It
emphasis on the concept of Information Retrieval (IR) and
various IR approaches for extracting knowledge driven
documents from the cluster of interlinked web documents.
Semantic search
engine uses
languages like
SPARQL, RQL
Ranking algorithm
Provides user
interface
Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013)
117
Here introduces the concept of ontology and their role in
Semantic Web. Since, ontology is considered as backbone
of software system, so it should be well designed without
creating any ambiguities.
This paper also shows proposed methodology for ontology
development using SDLC phases. The concept of
Semantic Web has revolutionized emerging technology by
extracting information from various web documents and
integrating them in machine form. We have developed
ontology on Computer Appreciation using one of ontology
editors named Hozo.
In this paper, research issues in Semantic analysis have
been described that plays vital role in Semantic web. It
enables meaningful relations between set of entities and
finds all possible paths by using Graph traversal algorithm.
It also describes architecture of Semantic enhanced
ontology based search engine.
REFERENCES
[1]. Urvi Shah, James Mayfield, “Information Retrieval on the Semantic
Web”, ACM CIKM International Conference on Information
Management, Nov 2002.
[2]. Gagandeep Singh, Vishal Jain, “Information Retrieval (IR) through
Semantic Web (SW): An Overview”, In proceedings of
CONFLUENCE-The Next Generation Information Technology
Summit, 27-28 September 2012, pp 23-27.
[3]. Accessible from T. Berners Lee, “The Semantic Web”, Scientific
American, May 2007
[4]. Berners Lee, J. Lassila, “Ontologies in Semantic Web”, “Scientific
American”, May 2001, pp 34-43.
[5]. Helena Sofia Pinto, Joao P. Martins, “Ontologies: How can they be
built? Knowledge and Information Systems, pages 441-464, 2004.
[6]. Amjad Farooq and Abad Shah, “Ontology Development
Methodology for Semantic Web System”, Pakistan Journal of Life
Social Sciences, Vol.6 No.1, May 2008, pp 50-58.
[7]. Kozaki K, “Hozo: An Environment for Building Ontologies”, In
Proceedings of the 13th International Conference on Knowledge
Engineering and Knowledge Management (EKAW), pp 213-218,
October 2002.
[8]. Deligiannidis L, Sheth A, “Semantic Analytics Visualization”,
Intelligence and Security Informatics Proc. ISI-2006, pp 48-59,
2006.
[9]. J. Mayfield, “Ontologies and text retrieval”, Knowledge
Engineering Review, 2007
[10]. Cristani, R. Cuel, “A Survey on Ontology Creation
Methodologies”, International Journal on Semantic Web and
Information Systems, Vol.1 No.2”, 2005.
[11]. Uschold, M. And King, “Towards A Methodology for Building
Ontologies”, IJCAI-95 Workshop on Basic Ontological Issues in
Knowledge Sharing, Montreal and Canada, 2006.
[12]. Uschold, M. And Gr Ninger, “Ontologies: Principles, Methods and
Applications”, Knowledge Engineering Review, Vol.11 No.2, pp
93-137.
[13]. Updegrove A, “The Semantic Web: An interview with Tim
Berners-Lee”, 2005.
[14]. S. Staab, R. Studer and Y. Sure, “Knowledge Processes and
Ontologies”, IEEE Intelligent Systems Vol. 16, No.1, pp 2-9, 2001
[15]. Kozaki K, R. Mizoguchi, “An Environment for Distributed
Ontology Development Based on Dependency Management”, In
proceedings of the 2nd International Semantic Web Conference
(ISWC), pp 453-468, 2003.
[16]. L. Stojanovic, “Migrating data intensive web sites into the Semantic
web”, In Proceedings of the 17th ACM symposium on applied
computing (SAC), ACM Press, pp 1100-1107, 2002.
[17]. Aleman-Meza B, Arpinar I.B, “A Context aware Semantic
association Ranking”, Technical Report LSDIS Lab, Computer
Science, Univ of Georgia, pp 03-010, 2003.
[18]. Dayal U, Kuno H, “Making the Semantic Web Real”, IEEE Data
Engineering Bulletin, Vol.26, No.4, pp 4-7, 2003.
[19]. Kaushal Giri, “Role of Ontology in Semantic Web”, DESIDOC
Journal of Library & Information Technology, Vol.31 No.2, March
2011, pp 116-120
[20]. Urvi Shah, Tim Finin and Anupam Joshi, “Information Retrieval on
the Semantic Web”, Scientific American, pp 35-45
About the Authors
Gagandeep Singh Narula has
completed B.Tech from Guru Tegh
Bahadur Institute of Technology
(GTBIT) affiliated to Guru Gobind
Singh Indraprastha University
(GGSIPU), New Delhi. His research
area includes Web Technology,
Semantic Web and Information Retrieval.
Vishal Jain has completed his
M.Tech (CSE) from USIT, Guru
Gobind Singh Indraprastha
University, Delhi and doing PhD in
Computer Science and Engineering
Department, Lingaya‟s University,
Faridabad. Presently, He is working
as Assistant Professor in Bharati Vidyapeeth‟s Institute of
Computer Applications and Management, (BVICAM),
New Delhi. His research area includes Web Technology,
Semantic Web and Information Retrieval. He is also
associated with CSI, ISTE.
Dr. Mayank Singh have done his
M. E in software engineering from
Thapar University and PhD from
Uttarakhand Technical University.
His Research areas are Software
Engineering, Software Testing,
Wireless Sensor Networks and Data
Mining. Presently He is working as
Associate Professor in Krishna Engineering College,
Ghaziabad. He is associated with CSI, IE (I), IEEE
Computer Society India and ACM.
Proceedings of the 2013 IEEE Second International Conference on Image Information Processing (ICIIP-2013)
118