ontology- semantic search engine

66
Ontology- Semantic Search Engine

Upload: abbas-hashmi

Post on 26-Nov-2014

133 views

Category:

Documents


0 download

DESCRIPTION

WEB has become to locate information and the ideal tool to find such information is search engine. Search engine as a tool to investigate the web must obtain the desired results for any given query. Success of a search engine is directly dependent on the satisfaction level of the user. Users desire the information to be presented to them within a short time interval .They also expect that the most relevant and recent information to be presented. Most of the search engines can not completely satisfy user’s requirements and the search results are often very inaccurate and irrelevant. Semantic technologies promise a next generation of semantic search engines, which promises to produce precise answers to the user queries by taking the advantage of availability of explicit semantics of information in a context of semantic web.

TRANSCRIPT

Page 1: Ontology-  Semantic Search Engine

Ontology-

Semantic Search Engine

Page 2: Ontology-  Semantic Search Engine

A PROJECT REPORT

ON

“Ontology-Semantic Search Engine”Submitted in Partial Fulfillment for the Bachelors Degree of

Engineering in COMPUTER SCIENCE

BY

Mr. Abbas Hashmi

Mr. Hardik Shah

Mr. Gautam Mehta

Mr. Anurag Jain

Mr. Sushant Gore

UNDER GUIDANCE OF

PROF. R.V.ARGIDDI

Dept. of Computer Science and Engineering

WALCHAND INSTITUTE OF TECHNOLOGY,

SOLAPUR-413006

2010-2011

Page 3: Ontology-  Semantic Search Engine

CERTIFICATE

This is to certify that the Project Report on

“Ontology- Semantic Search Engine”has been carried out by:

Roll No. Name Exam Seat Number

28 Sushant Gore 20347

30 Abbas Hashmi 20349

34 Anurag Jain 20353

53 Gautam Mehta 20370

71 Hardik Shah 20387 Students of B.E. (Computer Science)

Academic year 2010-2011

This Report is completed successfully under the Guide’s Supervision, in the partial fulfillment for the Award of ‘Bachelor’s Degree of Engineering in Computer Science’

By

Solapur University, Solapur.

PROF. R.V.ARGIDDI PROF. DR. MRS. S.S.APTE PROF.DR.S.A.HALKUDE

(Guide) (Head of CSE Dept.) (Principal)

DEPT. OF COMPUTER SCIENCE AND ENGINEERING,

WALCHAND INSTITUTE OF TECHNOLOGY,

Page 4: Ontology-  Semantic Search Engine

SOLAPUR-MAHARASHTRA

PROJECT APPROVAL SHEET

The Project Entitled

“Ontology- Semantic Search Engine”By

Roll No. Name Exam Seat Number

28 Sushant Gore 20347

30 Abbas Hashmi 20349

34 Anurag Jain 20353

53 Gautam Mehta 20370

71 Hardik Shah 20387

Is here approved in partial fulfillment for the Award of ‘Bachelor’s Degree of Engineering in Computer Science’

PROF. R.V.ARGIDDI PROF. DR. MRS. S.S.APTE PROF.DR.S.A.HALKUDE

(Guide) (Head of CSE Dept.) (Principal)

DEPT. OF COMPUTER SCIENCE AND ENGINEERING,

WALCHAND INSTITUTE OF TECHNOLOGY,

SOLAPUR-413006

2010-2011

Page 5: Ontology-  Semantic Search Engine

ACKNOWLEDGEMENT

Successful completion of any project requires identifying all the resources of information of the subject area mentioned in the reference. But during implementation, we realize that only technical books are not sufficient for working on any Project. This project was inspired, assisted and taught by our respected guide Prof. R.V.Argiddi . We would like to thank him for his intellectual exchanges, valuable suggestion, critical reviews and technical assistance. We extend our gratitude to him for teaching us punctuality, stability and devotion while working on our project.

More importantly we would like to again express our gratitude and love to our Head of Department Prof. S.S.Apte, for continuous support in our project and permitting the easy access to the facilities in our college which made our task easier. We also thank our respected Principal Dr. S.A.Halkude for the special consideration given us as B.E students in completing our project.

We also owe thanks to all our friends and our college staff for helping us directly or indirectly during our project.

To all the above people we are indebted for their suggestion, assistance and learning.

Page 6: Ontology-  Semantic Search Engine

CONTENTSPage no

1. Introduction 1 1.1 Abstract 1 1.2 Synopsis 2 1.3 Software Engineering Cycle 42. Technologies to be used 7 2.1 XML Parsing 7 2.2 Resource Description Framework (RDF) 11 2.3 Java Server Pages (JSP) 15 2.4 JDBC 17 2.5 WordNet Library 19 2.6 MySql 203. Project Requirements 21 3.1 Software Requirements 21 3.2 Hardware Requirements 214. System Design Detail 22 4.1 Data Flow Diagram 22 4.2 Sequence Diagram 25 4.3 Data Dictionary 265. Implementation Details 286. Advantages 357. Future Enhancement 368. Conclusion 379. References 38

Page 7: Ontology-  Semantic Search Engine

1:INTRODUCTION

1.1 ABSTRACT

WEB has become to locate information and the ideal tool to find such

information is search engine. Search engine as a tool to investigate the

web must obtain the desired results for any given query. Success of a

search engine is directly dependent on the satisfaction level of the user.

Users desire the information to be presented to them within a short time

interval .They also expect that the most relevant and recent information

to be presented. Most of the search engines can not completely satisfy

user’s requirements and the search results are often very inaccurate and

irrelevant.

Semantic technologies promise a next generation of semantic search

engines, which promises to produce precise answers to the user queries

by taking the advantage of availability of explicit semantics of

information in a context of semantic web.

Page 8: Ontology-  Semantic Search Engine

1.2 SYNOPSIS:

TITLE

“Ontology- Semantic Search Engine”

PURPOSE

One important aspect of semantic web is to make the meaning of information explicit through semantic mark-up, thus enabling more effective access to knowledge contained in heterogeneous information environments, such as the web. Semantic search plays an important role in realizing this goal, as it promises to produce precise answers to user’s queries by taking advantage of the availability of explicit semantics of information.

DESCRIPTION

Ontology is a systematic arrangement of all of the important categories of objects or concepts which exist in some field of discourse, showing the relations between them. When complete, ontology is a categorization of all of the concepts in some field of knowledge, including the objects and all of the properties, relations, and functions needed to define the objects and specify their actions.

Page 9: Ontology-  Semantic Search Engine

Search Engines is the most important tool to discover any information in World Wide Web. In a row with the traffic growth numbers of web, traditional search engine nowadays is not appropriate anymore to be used; therefore semantic search engines are on a role. Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which promises to produce precise answers to the user queries by taking the advantage of availability of explicit semantics of information in a context of semantic web. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason. The semantic search engine provides comprehensive means to produce precise answers that on one hand satisfies user queries and on the other hand are self explanatory and understandable by end users.

This project implements the semantic search engine on a local server, and hence it can be further extended on the World Wide Web.

The search engine that we implemented in the project is based on ontology. Here the user gives a query that is taken as the input to search engine. The task here is to divide the query into triples through a parser. This generated triple is used to match the triples which are existing in the system. The triples are already stored in the database which is retrieved from RDF files. These RDF files are fetched from the internet. These RDF files are then parsed to form the RDF DOM. The Document Object Model (DOM) describes an RDF document as a tree-like structure, with every RDF element being a node in the tree. A DOM-based parser reads the entire document, and (at least in principle) forms the corresponding document tree in memory. The DOM tree is formed from classes that all implement the org.w3c.dom.Node interface. This interface provides functions to walk or modify the tree (such as getChildNodes(), or appendChild() and removeChild()), and, of course, methods to query each node for its name and value.

Page 10: Ontology-  Semantic Search Engine

The generated results which are matched to the user’s query are then sent to the GUI and are displayed. If the results are more in number then the user can request for the net few results.

Page 11: Ontology-  Semantic Search Engine

1.3 Software Engineering Cycle

To better manage the development process and to achieve consistency, it is essential that the software development be done in phases. Hence the phase development process is central to the software engineering approach. Besides having a phase development process, it is essential that monitoring of a project for quality and cost involve objective means rather than subjective methods.

Requirement Analysis Phase

Requirement Analysis is done in order to understand the problem the s/w system is to solve. The problem could be automating an existing manual process, developing a new automated system or combination of two, the emphasis in requirement analysis is of identifying what is needed from the system, not how the system will achieve its goals. The goal of requirement specification phase is to produce the s/w requirement specification document.

There are two major activities in this phase: problem understanding or analysis and requirement specification. In problem analysis, the analyst has to understand the problem and its context .once the problem is analyzed and the essential understood, the requirement must be specified in requirement specification document. The requirement document must specify all functional and performance requirements; the formats of inputs and outputs; and all design constraints that exist due to political, economic, environmental and security reasons.

Page 12: Ontology-  Semantic Search Engine

Software Design Phase

The purpose of the design phase is to plan a solution of the problem specified by the requirement document. This phase is the first step in moving from the problem domain to the solution domain. The design of a system is the most critical factor affecting the quality of the s/w; it has a major impact of testing and maintenance phase. The output of this phase is the design document.

The design activity is often divided into two separate phases- System Design and Detailed Design. System design is to identify the module that should be in the system, the specification of these modules, and how they interact with each other to produce the desired results. During detailed design, the internal logic of each of the modules specified in the system design is decided. In system design the attention is on what components are needed, while in detailed design how the components can be implemented in s/w is the issue.

Coding Phase

The goal of coding phase is to translate the design of the system into code in a given programming language. The coding phase affects both testing and maintenance profoundly. The testing and maintenance costs of s/w are much higher than the coding costs; the goal of coding should be to reduce the testing and maintenance effort. An important concept that helps the understandability of programs is structures programming. The goal of structured programming is to linearism the control flow in the program.

Page 13: Ontology-  Semantic Search Engine

Testing Phase

Testing is the major quality control measure user during s/w development. Its basic function is to detect errors in the s/w. the testing phase has to uncover errors introduced during coding and also errors introduced during the previous phases. Thus, goal of testing is to uncover requirement, design and coding errors in the programs. Consequently, different levels of testing are used.

The starting point of testing is unit testing. In this, a module is tested separately and is often performed by the coder himself simultaneously along with the coding of the modules. After this, the modules are gradually integrated into subsystems, which are then integrated to eventually from the entire system. During integration of modules, integration testing is performed to detect design errors by focusing on testing the interconnection between modules. After the system is put together, system testing is performed. Here the system is tested against the system requirements are met and if the system performs as specified by the requirements. Finally acceptance testing is performed to demonstrate to the client, on the real life data of the client, the operation of the data.

Page 14: Ontology-  Semantic Search Engine

2. Technologies to be used

2.1 XML Parsing:

XML:

XML (ExtensibleMarkupLanguage) is a set of rules for encoding documents electronically. XML’s design goals emphasize simplicity, generality and usability over the Internet. It is textual data format, with strong support via Unicode for the language of the world. It is widely use for representation of arbitrary data structures, for example web services.

XML Parsing:

Parsing is the act of splitting up information into its component parts (schools used to teach this in language classes until the teaching profession collectively caught the anti-grammar disease).

‘Mary feeds Spot’ parses as

Subject = Mary, proper noun, nominative case

Verb = feeds, transitive, third person singular, present tense

Object = Spot, proper noun, accusative case

In computing, a parser is a program (or a piece of code or API that you can reference inside your own programs) which analyses files to identify the component parts. All applications that read input have a parser of some kind, otherwise they'd never be able to figure out what the information means. Microsoft Word contains a parser which runs when you open a .doc file and checks that it can identify all the hidden codes. Give it a corrupted file and you'll get an error message.

Page 15: Ontology-  Semantic Search Engine

XML applications are just the same: they contain a parser which reads XML and identifies the function of each the pieces of the document, and it then makes that information available in memory to the rest of the program.

While reading an XML file, a parser checks the syntax (pointy brackets, matching quotes, etc) for well-formedness, and reports any violations (reportable errors). The XML Specification lists what these are.

Validation is another stage beyond parsing. As the component parts of the program are identified, a validating parser can compare them with the pattern laid down by a DTD or a Schema, to check that they conform. In the process, default values and data types (if specified) can be added to the in-memory result of the validation that the validating parser gives to the application.

Example:

<person corpid="abc123” birth="1960-02-31" gender="female">

<name>

<forename>Judy</forename>

<surname>O'Grady</surname>

</name>

</person>

The example above parses as:

Page 16: Ontology-  Semantic Search Engine

Element person identified with Attribute corpid containing abc123 and Attribute birth containing 1960-02-31 and Attribute gender containing female containing...

Element name containing...

Element forename containing text ‘Judy’ followed by...

Element surname containing text ‘O'Grady’

As well as built-in parsers, there are also stand-alone parser-validators, which read an XML file and tell you if they find an error (like missing angle-brackets or quotes, or misplaced markup). This is essential for testing files in isolation before doing something else with them, especially if they have been created by hand without an XML editor, or by an API which may be too deeply embedded elsewhere to allow easy testing.

We can parse the XML using different languages like java, php, etc.

Parsing a RDF file includes the following code:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance ( );

DocumentBuilder builder = factory.newDocumentBuilder ( );

Document document =

builder. parse ("./webapps/ROOT/ontosearch/MyRdf.rdf”);

Element root=document.getDocumentElement();

visitNode(out,root);

Page 17: Ontology-  Semantic Search Engine

The visitNode function is defined as follows:

void visitNode(JspWriter out,Node n) throws Exception

{

if(n instanceof Element)

{

NodeList children=((Element)n).getChildNodes();

for (int i=0;i<children.getLength();i++)

{

Node child=children.item(i);

if (child==null || child.getFirstChild()==null)

continue;

visitNode(out,child);

}

}

}

Page 18: Ontology-  Semantic Search Engine

2.2 Resource Description Framework (RDF)

RDF (“Resource Description Framework”), which is the standard for encoding metadata and other knowledge on the Semantic Web. In the Semantic Web, computer applications make use of structured information spread in a distributed and decentralized way throughout the current web. RDF is an abstract model, a way to break down knowledge into discrete pieces, and while it is most popularly known for its RDF/XML syntax, RDF can be stored in a variety of formats. This article discusses the abstract RDF model, two concrete serialization formats, how RDF is used and how it differs from plain XML, higher-level RDF semantics, best practices for deployment, and querying RDF data sources.

RDF is a standard model for data interchange on the Web. RDF has features that facilitate data merging even if the underlying schemas differ, and it specifically supports the evolution of schemas over time without requiring all the data consumers to be changed.

RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually referred to as a “triple”). Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications.

This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations.

Page 19: Ontology-  Semantic Search Engine

Basic RDF Model:

The foundation of RDF is a model for representing named properties and property values. The RDF model draws on well-established principles from various data representation communities. RDF properties may be thought of as attributes of resources and in this sense correspond to traditional attribute-value pairs. RDF properties also represent relationships between resources and an RDF model can therefore resemble an entity-relationship diagram.

The basic data model consists of three object types:

Resources All things being described by RDF expressions are called resources. A resource may be an entire Web page; such as the HTML document "http://www.w3.org/Overview.html" for example. A resource may be a part of a Web page; e.g. a specific HTML or XML element within the document source. A resource may also be a whole collection of pages; e.g. an entire Web site. A resource may also be an object that is not directly accessible via the Web; e.g. a printed book. Resources are always named by URIs plus optional anchor ids (see [URI]). Anything can have a URI; the extensibility of URIs allows the introduction of identifiers for any entity imaginable.

Properties A property is a specific aspect, characteristic, attribute, or relation used to describe a resource. Each property has a specific meaning, defines its permitted values, the types of resources it can describe, and its relationship with other properties. This document does not address how the

Page 20: Ontology-  Semantic Search Engine

characteristics of properties are expressed; for such information, refer to the RDF Schema specification).

Statements   A specific resource together with a named property plus the value of that property for that resource is an RDF statement. These three individual parts of a statement are called, respectively, the subject, the predicate, and the object. The object of a statement (i.e., the property value) can be another resource or it can be a literal; i.e., a resource (specified by a URI) or a simple string or other primitive datatype defined by XML. In RDF terms, a literal may have content that is XML markup but is not further evaluated by the RDF processor. There are some syntactic restrictions on how markup in literals may be expressed;

Examples

Resources are identified by a resource identifier. A resource identifier is a URI plus an optional anchor id for the purposes of this section, properties will be referred to by a simple name.

Consider as a simple example the sentence:

Ora Lassila is the creator of the resource http://www.w3.org/Home/Lassila.

This sentence has the following parts:

 Subject (Resource)   http://www.w3.org/Home/Lassila 

 Predicate (Property)   Creator

 Object (literal)   "Ora Lassila"

Page 21: Ontology-  Semantic Search Engine

RDF-XML Format:

The Resource Description Framework (RDF) is a general-purpose language for representing information in the Web.

This document defines an XML syntax for RDF called RDF/XML in terms of Namespaces in XML, the XML Information Set and XML Base. The formal grammar for the syntax is annotated with actions generating triples of the RDF graph as defined in RDF Concepts and Abstract Syntax. The triples are written using the N-Triples RDF graph serializing format which enables more precise recording of the mapping in a machine processable form. The mappings are recorded as tests cases, gathered and published in RDF Test Cases.

Page 22: Ontology-  Semantic Search Engine

2.3 Java Server Pages (JSP)

Java Server Pages (JSP) technology provides a simplified, fast way to create dynamic web content. JSP technology enables rapid development of web-based applications that are server- and platform-independent. Java Server Pages (JSP) is a Java technology that helps software developers serve dynamically generated web pages based on HTML, XML, or other document types. Released in 1999 as Sun's answer to ASP and PHP [1], JSP was designed to address the perception that the Java programming environment didn't provide developers with enough support for the Web.

Java Server Pages (JSP) technology enables Web developers and designers to rapidly develop and easily maintain, information-rich, dynamic Web pages that leverage existing business systems. As part of the Java technology family, JSP technology enables rapid development of Web-based applications that are platform independent. JSP technology separates the user interface from content generation, enabling designers to change the overall page layout without altering the underlying dynamic content.

EXAMPLE:

<html>

<head> <title>First JSP page</title>

</head><body> <p align="center"> <font color="#FF0000" size="6"> <%="Java Developers Paradise"%> </font> </p>

Page 23: Ontology-  Semantic Search Engine

<p align="center"> <font color="#800000" size="6"> <%="Hello JSP"%> </font> </p>

</body> </html>

In jsp, java codes are written between '<%' and '%>' tags. So it takes the following form : <%= Some Expression %> In this example we have use 

          <%="Java Developers Paradise"%>

Page 24: Ontology-  Semantic Search Engine

2.4 JDBC

Java Database Connectivity or in short JDBC is a technology that enables the java program to manipulate data stored into the database.

The JDBC API

The JDBC application programming interface provides the facility for accessing the relational database from the Java programming language. The API technology provides the industrial standard for independently connecting Java programming language and a wide range of databases. The user not only execute the SQL statements, retrieve results, and update the data but can also access it  anywhere within a network because of  its "Write Once, Run Anywhere" (WORA) capabilities. 

Due to JDBC API technology, user can also access other tabular data sources like spreadsheets or flat files even in the heterogeneous environment.  JDBC application programming interface is a part of the Java platform that has included Java Standard Edition (Java SE) and the Java Enterprise Edition (Java EE) in itself.

The latest version of JDBC 4.0 application programming interface is divided into two packages

java.sql javax.sql. 

Java SE and Java EE platforms are included in both the packages.

Page 25: Ontology-  Semantic Search Engine

The JDBC Driver Manager

The JDBC Driver Manager is a very important class that defines objects which connect Java applications to a JDBC driver. Usually Driver Manager is the backbone of the JDBC architecture. It's very simple and small that is used to provide a means of managing the different types of JDBC database driver running on an application. The main responsibility of JDBC database driver is to load all the drivers found in the system properly as well as to select the most appropriate driver from opening a connection to a database.  The Driver Manager also helps to select the most appropriate driver from the previously loaded drivers when a new open database is connected.

The e JDBC-ODBC Bridge, also known as JDBC type 1 driver is a database driver that utilizes the ODBC driver to connect the database. This driver translates JDBC method calls into ODBC function calls. The Bridge implements Jdbc for any database for which an Odbc driver is available. The Bridge is always implemented as the sun.jdbc.odbc Java package and it contains a native library used to access ODBC.

Page 26: Ontology-  Semantic Search Engine

2.5 WordNet Library

As its name implies, the Java API for WordNet Searching (JAWS) is an API that provides Java applications with the ability to retrieve data from the WordNet database. It is a simple and fast API that is compatible with both the 2.1 and 3.0 versions of the WordNet database files and can be used with Java 1.4 and later.

From within the application you started you can use JAWS by first obtaining an instance of WordNetDatabase with code like the following, which assumes that you've performed an import of the classes in the edu.smu.tspell.wordnet package: 

WordNetDatabase database = WordNetDatabase.getFileInstance ();

Once you've done so, you can begin to retrieve synsets from the database as shown in the example below. This code retrieves all noun synsets for "fly" and loops through each one printing its first word form, its description, and the number of hyponyms associated with that noun synset:

NounSynset nounSynset; NounSynset[] hyponyms; 

WordNetDatabase database = WordNetDatabase.getFileInstance (); Synset[] synsets = database.getSynsets("fly", SynsetType.NOUN); for (int i = 0; i < synsets.length; i++)

{      nounSynset = (NounSynset)(synsets[i]);      hyponyms = nounSynset.getHyponyms();      System.err.println(nounSynset.getWordForms()[0] +": " +

Page 27: Ontology-  Semantic Search Engine

nounSynset.getDefinition() + ") has " + hyponyms.length + " hyponyms"); }

2.6 MySql

MySQL provides connectivity for client applications developed in the Java programming language through a JDBC driver, which is called MySQL Connector/J. MySQL Connector/J is flexible in the way it handles conversions between MySQL data types and Java data types. In general, any MySQL data type can be converted to a java.lang.String, and any numeric type can be converted to any of the Java numeric types, although round-off, overflow, or loss of precision may occur.

Connection to MySql database:try{            String url = "jdbc:mysql://localhost:3306/ontosearch";            Class.forName("com.mysql.jdbc.Driver").newInstance ();            connect= DriverManager.getConnection( url, “root”, “” );            if(!connect.isClosed())

{  System.out.println("Successfully connected to "+"MySQL server using TCP/IP...");

}    }catch(Exception ex) {         System.err.println("Exception: " + ex.getMessage()); }

Page 28: Ontology-  Semantic Search Engine

3. PROJECT REQUIREMENTS

3.1 SOFTWARE REQUIREMENTS

1. Operating System: Our project works on all Operating Systems that has its own browser.

2. JSP Server: The server must be installed on XP or other Operating System that supports JSP compilation.

For ex: we have installed Apache Tomcat 6.0 server, on windows XP.

However, we can use other servers such as GlassFish and others.

3.2 HARDWARE REQUIREMENTS

1. RAM: Minimum 256 Mb of RAM is required.

2. Harddisk: Minimum 40 Gb of Harddisk space is required.

3. Processor: Intel Processor with 1 GHz is minimum requirement.

Page 29: Ontology-  Semantic Search Engine

4. SYSTEM DESIGN DETAILS

4.1 DATA FLOW DIAGRAM

A data-flow diagram (DFD) is a graphical representation of the “flow” of data through an information system. DFDs can also be used for the visualization of data processing (structured design).

On a DFD, data items flow from an external data source or an internal data store to an internal data store or an external data sink, via an internal process.

A DFD provides no information about the timing or ordering of processes, or about whether processes will operate in sequence or in parallel. It is therefore quite different from a flowchart, which shows the flow of control through an algorithm, allowing a reader to determine what operations will be performed, in what order, and under what circumstances, but not what kinds of data will be input to and output from the system, nor where the data will come from and go to, nor where the data will be stored (all of which are shown on a DFD).

Page 30: Ontology-  Semantic Search Engine

Level 0: Context Diagram

Level 1: Software

Software

Database

RDF FilesInternet

Query

Answer

USER

Page 31: Ontology-  Semantic Search Engine

Level 2: RDF Module

Triple Manager

RDF Parser

Triple Finder GUI

Module

DataBase

RDF

Triples

USER

Result

Query

Internet

SCAN

SEARCH

RDF Parser Web Page Fetcher

RDF

Triple Manager

Page 32: Ontology-  Semantic Search Engine

Level 2: Triple Finder

4.2 SEQUENCE DIAGRAM

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that shows how processes operate with one another and in

Query PatternMatcher

Query ParserTriple

Retrieval

Result Generator

DataBase

Query GU

I

Page 33: Ontology-  Semantic Search Engine

what order. It is a construct of a Message Sequence Chart. Sequence diagrams are sometimes called event diagrams, event scenarios, and timing diagrams. A sequence diagram shows, as parallel vertical lines (lifelines), different processes or objects that live simultaneously, and, as horizontal arrows, the messages exchanged between them, in the order in which they occur. This allows the specification of simple runtime scenarios in a graphical manner.

User

Triple Manager

Triple Matcher

DataBase

ResultGenerator

Query FiredTarget Triple

Generated Result for User

Many Triple

Matched Triple

Result in Triple

Request for next Results

Generated Result for User

Available RDF

Page 34: Ontology-  Semantic Search Engine

4.3 DATA DICTIONARY

Ontosearch.triples

Field Type NULL KEY Default ExtraSid int(11) YES MULPid int(11) YESOid int(11) YES

Tripleid int(11) NO PRIauto_increme

nt

This table contains word id of triples (Subject, Predicate, Object) that are generated after parsing a rdf. The tripleid is a primary key and auto_increment.

Ontosearch.words

Field Type NULL KEYDefault Extra

Wid int(11) NO PRIauto_increment

Wordmediumtext YES

The words table stores all the words that are scanned during the parsing and when the synonyms are generated during the query processing. Each word contains the ‘wid’ which is used everywhere to reduce the load.

Page 35: Ontology-  Semantic Search Engine

Ontosearch.urls

Field Type NULL KEYDefault Extra

Urlid int(11) NO PRIauto_increment

urlvarchar(255) YES UNI

The urls tables contain all the url of file that has been parsed. It gives a url a unique id and this id is used to get the url for the result.

Ontosearch.urltriples

Field Type NULL KEY Default ExtraUrlid int(11) YES MULTripleid int(11) YES

Urltriples table includes the tripleid which is associated with each urlid. This helps in retrieving the url for the resulting answer and can direct the user to the original file where the answer was found.

Page 36: Ontology-  Semantic Search Engine

5. IMPLEMENTATION DETAILS

Classes Involved:

1. Class DownloadFile:

In this class there is one method that is as follows:

GetFile():

In this method we fetch the rdf files from the internet.

2. Class ParseFile:

In this class there are two methods that are as follows:

Parsing():

In this method we use the DocumentBuilderFactory method of DocumentBuilder Class to form the DOM structure of the fetched RDF's. Here, we pass the root element to the visitNode() method.

visitNode():

Page 37: Ontology-  Semantic Search Engine

In this method, we parse through the root element to the leaf node and the triples that we get while parsing are put into the words table.

3. Class AddWords:

In this class there is one method that is as follows:

AddWords():In this method the parsed words from rdf file that has been fetched are given unique ID's and they are stored in the Words table.

4. Class ConnectionManager:

In this class there is one method that is as follows:

getConnection():

In this method we create a connection with the MySql database, and wherever we need a database connectivity we create an instance of this class and call this getConnection method.

5.Class DatabaseManager:

In this class there are seven methods that are as follows:

getSubjects():

In this method we pass a List of Predicates and Objects as parameters and a list of Subjects that matches with the respective Objects and Predicates are retrieved from the triples table.

getPredicates():

Page 38: Ontology-  Semantic Search Engine

In this method we pass a List of Subjects and Objects as parameters and a list of Predicates that matches with the respective Objects and Subjects are retrieved from the triples table.

getObjects():In this method we pass a List of Predicates and Subjects as parameters and a list of Objects that matches with the respective Subjects and Predicates are retrieved from the triples table.

getWords():

In this method we pass a List of integers, and those integers are looked up into the words table to get respective Words.

getWordIds():

In this method we add the integers obtained from the GetWordID() method to the List of integers.

GetWordID():

This method returns an identifier from the Words Table for a specified word.

6. Class QueryParser:

In this class there are three methods that are as follows:

parse():

In this method, we parse the query fired by the user, we check the type of query and according to it we find out the subject, object or predicate

Page 39: Ontology-  Semantic Search Engine

whichever is present.Then we pass the triples to the populate() method to find their respective Synonyms.

populate(Synonym ):

In this method, we call the overridden populate() method , where we pass the subjects, objects and predicates separately.

populate(List ):

In this poplate() method, we accept a List of subjects, objects, predicates one at a time, and their Synonyms are found out using the

WordNet library and those are then added to the same List.

7. Class WN:

In this class there is one method that is as follows:

getSynonyms()

In this method, we use WordNetDatabase and Synset class to find the synonyms of a given word.

8: Class Synonym:

In this class there is one method that is as follows:

Page 40: Ontology-  Semantic Search Engine

Synonym()

In this method, the query that has been fired is passed to the parse() method of the class QueryParser.

9: Class WordId:

In this class there is one method that is as follows:

getWordId():

This method returns an identifier from the Words Table for a specified word.

Page 41: Ontology-  Semantic Search Engine

PRACTICAL IMPLEMENTATION

Page 42: Ontology-  Semantic Search Engine
Page 43: Ontology-  Semantic Search Engine
Page 44: Ontology-  Semantic Search Engine

6. ADVANTAGES

1. Semantic technologies promise a next generation of semantic search engines, which promises to produce precise answers to the user queries by taking the advantage of availability of explicit semantics of information in a context of semantic web.

2. The semantic search engine provides comprehensive means to produce precise answers that on one hand satisfies user queries and on the other hand are self explanatory and understandable by end users.

3. The use of WordNet library, has made the search efficient since it considers the synonyms of the words used by the user while searching.

Page 45: Ontology-  Semantic Search Engine

7. FUTURE DEVELOPMENT

The results shown to the user are retrieved from the RDF already present on the websites. This limit of searching in the RDF of the website will be changed to searching into the WebPages of the website.

Page 46: Ontology-  Semantic Search Engine

8. CONCLUSION

Thus we have successfully implemented our project entitled “Ontology-A Semantic Search Engine”. The project will ensure the best search results for the user. The implementation of this project will save user time. The results generated will overcome the problems of traditional search engines, which usually gives irrelevant outputs. Now our Search Engine is based on RDF’s DOM structure, but in future we can enhance this by using NLP (Natural Language Processing).

Page 47: Ontology-  Semantic Search Engine

9. REFERENCES

1. www.company.hakia.com/new/whatis.html2. RDF(Resource Description Framework)-Wikipedia3. JSP Black Book