short term scientific mission (cost-stsm-ic1302-33401 ...€¦ · the purpose of the stsm was to...

3
Short Term Scientific Mission (COST-STSM-IC1302-33401) - REPORT KEYSTONE COST Action IC1302 STSM Applicant: Ana Mestrovic, Ph.D. Home Institution: Department of Informatics, University of Rijeka, Croatia Host Institution: Department of Computer Science and Information Systems, University of London, Birkbeck College, United Kingdom Duration: April 4th, 2016 - April 8th, 2016 STSM title: ONTOLOGY-BASED INFORMATION RETRIEVAL: A GRAPH BASED APPROACH 1. Purpose of the STSM The purpose of the STSM was to start a research collaboration in the area of ontology-based information retrieval. The main focus of our research is to devise a novel technique, or better a framework, for information retrieval with the aid of ontologies extracted from the Linked Data cloud. This research is inspired by the ideas discussed at the KEYSTONE WG meeting recently held in Marseille. Furthermore, our goal was to establish a new collaboration between the Langnet team from the Department of Informatics, University of Rijeka and the Department of Computer Science and Information Systems, Birkbeck University of London. 2. Description of the work carried out during the STSM During my visit to the Department of Computer Science and Information Systems in the duration of one working week, I had the opportunity to work with Dr Andrea Call; he shared his expertise in formalisation and reasoning in knowledge bases, while I contributed with my experience in computational linguistics and natural language processing. We started our research by discussing the related work and existing approaches on the topic of ontology-based information retrieval. We shared ideas and analysed possible novel approaches, chiefly based on graph-based knowledge representation. In particular, we focussed on how to formalise semantic and linguistic relations and to include semantic knowledge in the proposed framework. We propose a general framework for information retrieval that can use an arbitrary set of ontologies for document and query expansion; however, our current plan is to make use of ontologies extracted from the Linked Data cloud which, despite its flat nature of data representation, contains ontological knowledge in abundance. In our approach we adopt and extend some existing techniques (variations of the ontology-based vector space model with document and query expansion; see references below) and propose a novel

Upload: others

Post on 13-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Short Term Scientific Mission (COST-STSM-IC1302-33401 ...€¦ · The purpose of the STSM was to start a research collaboration in the area of ontology-based information retrieval

Short Term Scientific Mission (COST-STSM-IC1302-33401) - REPORT

KEYSTONE COST Action IC1302

STSM Applicant: Ana Mestrovic, Ph.D.

Home Institution: Department of Informatics, University of Rijeka, Croatia

Host Institution: Department of Computer Science and Information Systems, University of

London, Birkbeck College, United Kingdom

Duration: April 4th, 2016 - April 8th, 2016

STSM title: ONTOLOGY-BASED INFORMATION RETRIEVAL: A GRAPH BASED APPROACH

1. Purpose of the STSM

The purpose of the STSM was to start a research collaboration in the area of ontology-based

information retrieval. The main focus of our research is to devise a novel technique, or better a

framework, for information retrieval with the aid of ontologies extracted from the Linked Data cloud.

This research is inspired by the ideas discussed at the KEYSTONE WG meeting recently held in

Marseille.

Furthermore, our goal was to establish a new collaboration between the Langnet team from the

Department of Informatics, University of Rijeka and the Department of Computer Science and

Information Systems, Birkbeck University of London.

2. Description of the work carried out during the STSM

During my visit to the Department of Computer Science and Information Systems in the duration of

one working week, I had the opportunity to work with Dr Andrea Call; he shared his expertise in

formalisation and reasoning in knowledge bases, while I contributed with my experience in

computational linguistics and natural language processing.

We started our research by discussing the related work and existing approaches on the topic of

ontology-based information retrieval. We shared ideas and analysed possible novel approaches,

chiefly based on graph-based knowledge representation. In particular, we focussed on how to

formalise semantic and linguistic relations and to include semantic knowledge in the proposed

framework. We propose a general framework for information retrieval that can use an arbitrary set

of ontologies for document and query expansion; however, our current plan is to make use of

ontologies extracted from the Linked Data cloud which, despite its flat nature of data representation,

contains ontological knowledge in abundance.

In our approach we adopt and extend some existing techniques (variations of the ontology-based

vector space model with document and query expansion; see references below) and propose a novel

Page 2: Short Term Scientific Mission (COST-STSM-IC1302-33401 ...€¦ · The purpose of the STSM was to start a research collaboration in the area of ontology-based information retrieval

framework that can capture semantic relations. In the proposed framework, document and query

expansion relies on a base taxonomy that is extracted from a lexical database, thesaurus or other

ontology (e.g. WordNet, Wikictionary, etc.). Each term from document or query can be described as

a vector of base concepts from the base taxonomy. Next we extend our framework with the

possibility of including new ontologies represented as graphs and thus incorporate semantic

relations. We define a set of mapping functions which maps multiple ontological layers into the base

taxonomy. This way, each concept form the included ontologies can be represented as a vector of

base concepts from the base taxonomy as well. Furthermore, we propose a general weighting

scheme, which is used in the vector space model.

The result of the proposed framework is that we can take into account the various lexical and

semantic relations between terms and concepts (e.g. synonymy, hierarchy, meronymy, antonymy,

geo-proximity, etc). This way we manage to avoid certain vocabulary problems (e.g. synonymy) and

reduce the dimensionality of the vector space model.

In the draft paper we describe a first version of the framework which will be improved during the

course of the future joint research. We expect that by applying the proposed approach on real data

sets will result in certain improvements in terms of precision and recall with respect to other

techniques in the literature.

3. Description of the main results obtained

Besides the results mentioned above, the main result of STSM is the draft of a paper that will be

submitted to the second 2016 KEYSTONE conference (IKC 2016).

4. Future collaboration with the host institution (if applicable)

We plan to continue our collaboration on this research. We will finish the paper before the deadline

of the IKC conference (June 1st). Next, we plan to extent the proposed approach and test the

framework using suitable data sets. We will analyse the effect of the proposed techniques on the

effectiveness and efficiency of relevant information retrieval tasks. Our plan is to submit further

result to a journal as well as to explore joint submissions to grant calls (see below).

5. Foreseen publications/articles resulting from the STSM (if applicable)

(i) Paper for the second 2016 KEYSTONE conference (IKC 2016).

(ii) We plan to extend the aforementioned paper with further experimental results and domain-

specific techniques; the extended version will be submitted to a suitable journal, and possibly (if

suitable) to a relevant Semantic Web conference.

6. Other comments (if any)

Selected references

• Blanco, Roi, and Christina Lioma. "Graph-based term weighting for information retrieval".

Information retrieval 15:1 (2012), pp. 54-92.

Page 3: Short Term Scientific Mission (COST-STSM-IC1302-33401 ...€¦ · The purpose of the STSM was to start a research collaboration in the area of ontology-based information retrieval

• Baziz, Mustapha, et al. "An information retrieval driven by ontology from query to document

expansion." Large Scale Semantic Access to Content (Text, Image, Video, and Sound). Le

centre de hautes etudes Internationales d'informatique documentaire, 2007.

• Dragoni, Mauro, Celia da Costa Pereira, and Andrea G.B. Tettamanzi. "A conceptual

representation of documents and queries for information retrieval systems by using light

ontologies". Expert Systems with applications 39:12 (2012), pp. 10376-10388.

• Schuhmacher, Michael, and Simone Paolo Ponzetto. "Knowledge-based graph document

modeling." Proceedings of the 7th ACM international conference on Web search and data

mining. ACM, 2014.

Sincerely,

Ana Mestrovic

Department of Informatics

University of Rijeka