application of ontology in semantic information retrieval by prof shahrul azman from fstm, ukm
DESCRIPTION
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azman from FSTM, UKM Presentation for MyREN Seminar 2014 Berjaya Hotel, Kuala Lumpur 27 November 2014TRANSCRIPT
Application of Ontology in
Semantic Information Retrieval
Presentation for MyREN Seminar
Berjaya Hotel, Kuala Lumpur
27 November 2014
1
Brief speaker’s info
2
Shahrul Azman Mohd. Noah, Ph.D.Knowledge Technology Research GroupCenter for AI Technology (CAIT)[email protected]
Graduated in BSc(Mathematics) from UKM
Graduated in MSc(IS) from Sheffield U.
Graduated in PhD(IS) from Sheffield U. –
knowledge-based systems
From Muar, Johor
ONTOLOGY
5
What is ontology?
• Ontology may be considered as a kind of method to represent knowledge.
• From a philosophical discipline – the science of “what is”; the kinds and structures of objects, properties, events, processes and relations in every area of reality.
• Aristotle classification of animals is one
the first ontology developed.
6
Ontology in Computing
• An ontology is an engineering artifact: – It is constituted by a specific vocabulary used to describe a
certain reality, plus
– A set of explicit assumptions regarding the intended meaning of the vocabulary.
• Thus, an ontology describes a formal specification of a certain domain:– Shared understanding of a domain of interest
– Formal and machine manipulable model of a domain of interest
7
8
Ontology Definition
Formal, explicit specification of a shared conceptualization
commonly accepted
understanding
conceptual model
of a domain
(ontological theory)
unambiguous
terminology definitions
machine-readability
with computational
semantics
[Gruber93]
Source: Smith & Welty (2001)
a catalog
a set of
text files
a glossary
a thesaurus
a collection of
taxonomies
a set of
general logical
constraints
a collection of
frames
Complexity
An ontology is…
9
Various approaches to classify ontologies
10
Classify ontologies according to the information
the ontology needs to express and the richness
of its internal structure (Lassila & McGuiness,
2001)
Classify into 2 orthogonal dimensions: the amount
and type of structure and the subject (Van Heijst et
al., 1997)
Classify ontologies according to their level of
dependence on a particular task (Guarino, 1998)
Ontology language
• Ontology languages are formal languages used to construct ontologies – allow the encoding of knowledge about specific domains and often
– include reasoning rules that support the processing of that knowledge
• Various languages have been proposed: CycL, KL-One, Ontolingua, F-Logic, OCML, LOOM, Telos, RDF(S), OIL, DAML+OIL, XOL, SHOE, OWL etc.
• Usually based on Description Logic (DL).
• Summarised as (Kalibatiene & Vasilecas, 2011):
11
Example of ontologies
• Top level ontology -
12
Suggested Upper Merged Ontology (SUMO
13
Portion of SUMO ontology with
USGS Geo-concepts inserted
Example of ontologies (cont.)
• Lexical ontology - Wordnet
14
Example of ontologies (cont.)
• Domain ontology - Simple News and Press Ontologies
(SNaP)
15
Linked Data…?
16
Applications of ontology
• Searching & browsing
• Decision support system
• Question answering system
• Recommendation
• Data integration
• Etc.
17
INFORMATION RETRIEVAL
18
Concepts
• “Information retrieval (IR)is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968).
• Applications of IR: recommendations, Q&A, filtering… and of course searching.
20
Issues in IR
• Some issues in IR:
– Relevance
– Evaluation
– Users and information needs
• Context based search
• Semantic search
• Etc.
21
IR process
22
ONTOLOGY + INFORMATION RETRIEVAL
23
Ontology and semantic search
• Various ways to support semantic search:
– Query expansion –users query are expanded with related
terminological terms
– Disambiguation – resolving terms or concepts when they
refer to more than one topics
– Classifying – classify documents such as ads into
ontological topics to support semantic search
– Enhanced IR model – embed ontology into existing IR
model resulting a modified IR model
25
Query Expansion
• Query expansion (QE) is needed due to the
ambiguity of natural language.
• Main aim of QE – to add new meaningful terms to
the initial query.
26
Bhogal, J., Macfarlane, A. & Smith, A. 2007. A review of ontology based query expansion. Information
Processing and Management, 43: 866-886.
Query Expansion
27
Semantic index
• Textual documents are indexed according to some ontology model.
• Remember the concept of vocabulary in IR?
31
architecture
bus
computer
database
….
xmlcomputer science
collection index terms or vocabulary
of the collection
IndexingExtract
Semantic index
• Textual documents are indexed according to some ontology model.
• Remember the concept of vocabulary in IR?
32
computer science
collection Replace the index with ontological-index
IndexingExtract
architecture
bus
computer
database
….
xml
Examples
• Three research projects that illustrate the
applications of ontology-based IR:
– Semantic digital library
– Crime news retrieval
– Multi modality ontology-based image retrieval
35
Semantic digital library
• Proposed an approach for managing, organizing and populating ontology for document collections in digital library.
• The document metadata and content are inserted and populated to a knowledge base which allows sophisticated query and searching.
• Firstly to propose an ontology based information retrieval model which is based on the classic vector space model which includes document annotation, instance-based weighting and concept-based ranking.
36
Semantic digital library
• General architecture
37
Semantic digital library
• Involved three
ontologies – ACM
Topic hierarchies,
Geo ontology and
Dublin core
metadata
• Portion of domain
ontology focusing
on academic thesis
38
Semantic digital library
• Document
annotation
39
Semantic digital library
• The process
40
VSM Index #create Class Person
#create instance of Class Student
<Student rdf:ID="Student1">
<rdfs:label>Arifah Alhadi</rdfs:label>
</Student>
<Student rdf:ID="Student2">
<rdfs:label
rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>Asyraf Arifin</rdfs:label>
</Student>
#Create Instance of Class Supervisor
<Supervisor rdf:ID="Supervisor1">
<rdfs:label>PM Dr Shahrul Azman</rdfs:label>
<rdfs:label>Prof. Madya Dr. Shahrul Azman Mohd
Noah</rdfs:label>
</Supervisor>
<Supervisor rdf:ID="Supervisor2">
<rdfs:label>Prof Aziz Deraman</rdfs:label>
</Supervisor>
Concept Instance Document
s
http://www.ukm.my/thesis/supervisor#
http://www.ukm.my/thesis/person#Supervisor1 Doc1
http://ukm.my/thesis/student#
http://ukm.my/thesis/creator#
http://ukm.my/thesis/person#
Student1 Doc1
http://ukm.my/thesis/student#
http://ukm.my/thesis/creator#
http://ukm.my/thesis/person#
Student2 Doc1
Id Term TFIDF Frq Doc
Id
1 Arifah Alhadi 0.11 2 Doc1
2 Asyraf Arifin 0.123 1 Doc1
3 PM Dr Shahrul
Azman
0.45 1 Doc1
Ontology-based IR for crime news retrieval
• Each crime news must be classified into categories: Traffic Violation, Theft, Sex Crime, Murder, Kidnap, Fraud, Drugs, Cybercrime, Arson and Gang (Chen et al. 2004)
• Useful entities need to be identified: Person, Location, Organisation, Date/Time, Weapon, Amount, Vehicle, Drug, Personel properties, and Age.
• Clustering of crime news into topics, e.g. Nurin Jazlin murder, Canny Ong, Sosilawati etc.
• Clustering of specific topic into various
and chronological events.
• Mapping of named entities into news
ontology to support semantic querying and retrieval.
42
Example
43
Murder Kidnap Theft Gang
Nurin Jazlin Sosilawati Canny Ong
Investigation into Canny Ong case
include medical report and trialEvidence/Suspect into Canny
Ong caseDNA test
Family reacts into Canny Ong and
negligence suitCourt Sentence, plead guilty
(17) (6) (3) (9)(13)
………………..
Classification
Clustering
Cluster into topics
Required methods
• In order to support the aforementioned
requirements:
– Conventional text processing - tokenizing, indexing,
stopping, stemming etc.
– Named entity recognition (NER)
– Classification and clustering
– Ontology mapping
44
46
PRE-PROCESSING TASK
DOCUMENT REPRESENTATION
DOCUMENT ORGANIZATION
+
+
• Stopword removal
• Stemming
• Parsing
• Indexing
• Bag of words
• Named entity
recognition
• Classification
- AdaBoost
• Clustering –
KNN
• Semantic
mapping
Document representation
• Documents will be presented into meaningful
forms:
– BoW – Bag of Words
– Named Entity Recognition – used the GATE Annie and
Jape rules
– Adopt the Vector Space Model (VSM) but enhanced with
ontological model
48
Document representation
49
Document organization
• Documents need to be organised into categories,
topics and events.
– Classification – Adaboost algorithm
– Clustering – Used the KNN clustering
– Ontology mapping – we have develop a crime news
ontology by extending the existing SNaP ontology.
Includes classes/entities which are important to crime
such as classification of crimes, location and weapon.
50
51
Asset ontology
Event ontology
Extending the SNaP ontology and
mapping to entities in news documents
52
SNaP
Crimepne:Event
pna:Asset
pns:Stuff
pns:Tangible
pns:Organizationpns:Location
pns:Person
event:Event
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
pns:Weapon
pns:Vehicle
pnc:Classification
<Murder><Kidnap>
rdf:typerdf:type
rdfs:subClassOf
pne:
subeventOf
rdfs:domain
rdfs:range
<Event 1>
rdf:type
pnt:Tag
rdfs:subClassOfrdfs:subClassOf
pnc:Classifiable
pnc:
isClassifiedBy
rdfs:subClassOf
rdf:domain
rdf:range
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
The Application
• What we need/desire.
53
Ontology-based Image Retrieval
• Rapid growth of visual information (VI) – lead to difficulty in finding and accessing VI.
• Inability to capture the semantic content.
• Problem arise – lack of coincidence between information extracted from VI and user needs.
• Conventional approaches of image retrieval (IMR) - TBIR and CBIR have reached their limit in attempting to solve this problem.
• As a result – SBIR approach,
ontology-based provide an explicit
domain oriented semantic for
concept and relationship.
55
Ontology-based Image Retrieval
• Illustrate how images are describes based on it
visual, textual and domain semantic features.
• Proposed a multi-modality ontology: visual
ontology, textual ontology and domain ontology.
• Illustrate how such ontology can be integrated with
open source knowledge base (DBpedia) to support a
more comprehensive search.
56
Proposed Approach
57
Example of multi-modality ontology
58
Example of Multi-modality ontology with
DBpedia
59
Conclusion - Practical implementation of
ontology-based IR
60
TBox
ABox
Ontology
Documents
Index
Extractionbuild
Population
Annotation
Query
Processing
query
ranked docs
Research issues
• Index representation – most still based
on the conventional VSM.
• Ranking – weighting and ranking
mechanisms
• Automatic population – supervised and
unsupervised
• Extraction & annotation
• Multilingual and cross-language
61
References
• Castells, P., Fernandez, M.,Vallet, D. 2007. An Adaptation of Vector Space Model for Ontology Based Information Retrieval. IEEE Transaction on Knowledge and Data Engineering, 19(2):
• Shahrul Azman Noah, Nor Afni Raziah Alias, Nurul Aida Osman, ZuraidahAbdullah, Nazlia Omar, Yazrina Yahya, Maryati Mohd Yusof: Ontology-Driven Semantic Digital Library. AIRS 2010: 141-150.
• Shahrul Azman Noah, Datul Aida Ali: The Role of Lexical Ontology in Expanding the Semantic Textual Content of On-Line News Images. AIRS 2010: 193-202.
• Fernández, M., Cantador, I., López, V. , Vallet, D., Castells, P., & Motta, E. 2011. Semantically enhanced information retrieval: an ontology-based approach. Web Semantics: Science, Services and Agents on the World Wide Web, 9: 434-452.
• Kara, S. Alan, O., Sabuncu, O., Akpınar, S., Cicekli N.K., & Alpaslan, F.N. 2012. An ontology-based retrieval system using semantic indexing. Information Systems, 37: 294-305.
• Kohler, J., Philippi, S., Specht, M., & Ruegg, A. 2006. Ontology based text indexing and querying for the semantic web. Knowledge-Based Systems, 19: 744-754.
• Etc.
62
Example - advanced application of
ontology
64
Watson – the science behind an answer
65
66
1 2 3 4
5 6 7 8
9 10 11
Group members:
1. Shahrul Azman Mohd. Noah
2. Juhana Salim
3. Masnizah Mohd
4. Nazlia Omar
5. Mohd Juzaiddin Ab Aziz
6. Nazlena Mohamad Ali
7. Saidah Saad
8. Shereena Mohd Arif
9. Lailaltulqadri Zakaria
10. Sabrina Tiun
11. Maryati Mohd. Yusof
END
67