crosslingual ontology-based document retrieval (search) in an elearning environment ranlp, borovets,...

20
Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

Post on 19-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

Crosslingual Ontology-Based

Document Retrieval (Search)

in an eLearning Environment RANLP, Borovets, 2007

Eelco MosselUniversity of Hamburg

Page 2: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

2

• EU-Project LT4eL: Language Technology for eLearning (www.lt4el.eu)

• Goal: use of Language Technology to improve the effectiveness of Learning Management Systems

• Multilingual Setting: 8 languages• 12 European partner universities/institutes• Crosslingual search: work together with:

– Cristina Vertan, Stefanie Reimers (University of Hamburg)

– Kiril Simov and his team (Bulgarian Academy of Sciences, Sofia)

– Alex Killing (ETH Zürich (Eidgenössische Technische Hochschule))

Framework

Page 3: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

3

• Goals of semantic search• Resources for search function• Functionality and architecture• Further work

Overview

Page 4: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

4

Goals of the approach1. Improved retrieval of documents

– Find documents that would not be found by simple text search (exact search word occurs in text)

– Example: search for “screen” – retrieve doc that contains “monitor” but not “screen”.

2. Multilinguality– One implementation for all languages in the project

3. Crosslinguality– Find documents in languages different from

search/interface language• No need to translate search query• Search possible with passive foreign language knowledge

Crosslingual semantic search

Page 5: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

5

• A multilingual document collection• An ontology including a domain ontology on

the domain of the documents• Concept lexicalisations in different

languages• Annotation of concepts in the documents

Overview of resources

Page 6: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

6

Overview of resources (graphical)

PLPTRO

ENMTNL

BGCDDE

Lexicons:TermConce

pt

LOs

Ontology

   

   

  

BGCSDEENMTNLPLPTRO

Page 7: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

7

Ontology: contains concepts Document

Database

Lexicons: contain

term-concept mappings

Visualisation selec

t conce

pts

Search-Terms(multiple languages)

Search-Concepts

Retrieved Documents

Search procedure

Page 8: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

8

Search with ILIAS

Page 9: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

9

Page 10: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

10

Search functionality comprises:

1. Find terms in lexicons that reflect search query.

2. Find corresponding concepts for derived terms.

3. Find relevant documents for concepts. 4. Create ranking for set of found

documents.5. Create ontology fragment containing

necessary information to present concept neighbourhood

6. Find “shared concepts”

Internal components

Page 11: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

11

Architecture

CrosslingualSearch

LMS / ILIAS / other system using

the search functionality

LexiconLookup

Component

Ontology Management

System

OntologySearchEngine

Lexicon OntologyLucene

Database

Page 12: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

12

• Why start with a free text query?– User wants results fast (as in Google)– Compete with fulltext search and keyword

search– Find starting point for ontology browsing

• Query lexicon: adopted/implemented strategies for– Case and diacritic insensitive– Create combinations for multiword terms

Example: Text Editor • text-editor• texteditor• text editor• text• editor

1: Query Terms

Page 13: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

13

• Other ideas to improve recognition of query:– Lemmatisation of search terms– Expansion of lexicon with word forms– Match substrings– Match similar strings

• Insertion of function words e.g. Portuguese: “provedor acesso” “provedor de acesso”

- Dynamic list of available terms that contain input so far (involves change of GUI)

1: Query Terms (continued)

Page 14: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

14

Not always 1:1 mapping.• Corresponding concept is missing from ontology

– LT4eL: not in lexicon• Unique result: term is lexicalisation of one concept• Multiple concepts from one domain, e.g.:

– Key (from keyboard)– Key (in database)

• Concepts from more domains: – Window (graphical representation on monitor)– Window (part of a building)

• Different concepts for different languages:– “Kind” (English: sort/type)– “Kind” (German: child)

Let the user choose: present multiple browsing units

2: Term Concept

Page 15: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

15

• Simplest: – Disjunctive search with ranking

• For each concept, each document that is annotated with it is returned

• Documents with more search concepts are ranked higher– DISADVANTAGES:

• (too) many results• slower

• Use super/subconcepts• Further possibilities

– Conjunctive search:• Combination of concepts must occur in a document• Is taken into account by current ranking

– DISADVANTAGES:• For automatic concept search: concept set might be larger

than expected, thus restricting search results too much

3: Concept Documents

Page 16: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

16

• How useful is it, to find documents that treat a superconcept?– Negative example: lt4el:Subroutine lt4el:Software.– Positive example: lt4el:WebPortal lt4el:Website.

• How useful is it, to find documents that treat a subconcept?– lt4el:Program has 93 subconcepts, e.g.:

• ApplicationProgram• Computervirus• Driver• Unzip

3: Concept Documents (continued)

Page 17: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

17

• Number of different search concepts• Annotation frequency: number of times

search concepts are annotated in the document– Normalise: divide by document length

• Superconcepts and subconcepts of search concepts have lower weight – A factor determines their weight

• Language of document:– Sort per language? (currently)– Sort by ranking throughout (independent of)

languages?– Make language a factor in ranking?

4: Ranking

Page 18: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

18

• Does semantic search return correct results? (appropriate documents)

• How easy is it to use semantic search?• Are the results better (precision/recall)

than with keyword search or fulltext search (also available in ILIAS)?– Relevant for monolingual scenario

• Is the learning process improved?– Depends on quality of ontology and annotation– In multilingual case: depends on domain

knowledge and language knowledge of multilingual test persons

Evaluation

Page 19: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

19

• Display document fragment for search results, in addition to title.– Choose contexts, where search concepts occur

close together– More on this Thursday 18:30 at BIS-21++

information session.

• Integrate faster document lookup component

• Improve: search term lexicon entry• Make use of more relations than

super/subconcepts• Possibly other changes like:

– Sort differently than per language

Future work

Page 20: Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg

20

Thank you