a system for a semi-automatic ontology annotation kiril simov, petya osenova, alexander simov,...

21
A System for A Semi- A System for A Semi- Automatic Ontology Automatic Ontology Annotation Annotation Kiril Simov, Petya Osenova, Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Alexander Simov, Anelia Tincheva, Borislav Kirilov Borislav Kirilov BulTreeBank Group BulTreeBank Group LML, IPP, BAS LML, IPP, BAS CALP 2007, RANLP, Borovets CALP 2007, RANLP, Borovets

Post on 20-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

A System for A Semi-Automatic A System for A Semi-Automatic Ontology AnnotationOntology Annotation

Kiril Simov, Petya Osenova, Alexander Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav KirilovSimov, Anelia Tincheva, Borislav Kirilov

BulTreeBank GroupBulTreeBank Group

LML, IPP, BASLML, IPP, BAS

CALP 2007, RANLP, BorovetsCALP 2007, RANLP, Borovets

Page 2: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Outline of the TalkOutline of the TalkMotivationMotivation

Requirements to the system Requirements to the system

Parameters of semantic annotationParameters of semantic annotation– General overviewGeneral overview– Problematic issuesProblematic issues

CLaRK SystemCLaRK System– Basic architecture in briefBasic architecture in brief– The new functionalitiesThe new functionalities

Conclusions Conclusions

Page 3: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Motivation (1)Motivation (1)

The creation of automatic systems The creation of automatic systems for semantic annotation needs:for semantic annotation needs:

– Reliably annotated corpora with Reliably annotated corpora with semantic information = gold standard semantic information = gold standard datadata

Page 4: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Motivation (2)Motivation (2)

The The semantic annotationsemantic annotation requires requires various types of support:various types of support:

– appropriate source of semantic appropriate source of semantic information (information (domain ontologydomain ontology))

– comprehensive annotation guidelinescomprehensive annotation guidelines– a system to support semi-automatic a system to support semi-automatic

creation of such corpora (creation of such corpora (CLaRKCLaRK))

Page 5: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Motivation (3)Motivation (3)

The annotation process follows the two steps:

– chunk annotationchunk annotation - identification of identification of the text segment which represents a the text segment which represents a given concept or a relation in the given concept or a relation in the texttext

– concept selectionconcept selection - a chunk might - a chunk might represent more than one concept or represent more than one concept or relation depending on the contextrelation depending on the context

Page 6: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Motivation (4)Motivation (4)We follow the ideas of We follow the ideas of Erdmann et al. Erdmann et al. 20002000 that the manual (or semi- that the manual (or semi-automatic) semantic annotation is a automatic) semantic annotation is a cyclic process mixing:cyclic process mixing:– the actual annotation, andthe actual annotation, and– the evolution of the ontology the evolution of the ontology

In our case we also include the In our case we also include the lexiconlexicon and the and the concept annotation grammarconcept annotation grammar in the process of the concurrent in the process of the concurrent development.development.

Page 7: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Support requirements to the system Support requirements to the system (1)(1)

Search for a text segment:Search for a text segment: helps the helps the annotator to determine the exact annotator to determine the exact segment of text which is the carrier segment of text which is the carrier of the concept or relation from the of the concept or relation from the ontologyontology

Concept selection:Concept selection: determines which determines which concept/relation to be added to the concept/relation to be added to the annotation of the corresponding text annotation of the corresponding text segmentsegment

Page 8: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Support requirements to the system Support requirements to the system (2)(2)

Ontology evolution:Ontology evolution: updates the updates the ontology in following cases:ontology in following cases:– new concept/relation is necessary for the new concept/relation is necessary for the

annotation of a text segmentannotation of a text segment– an existing concept needs to be changed an existing concept needs to be changed

in order to be more precisein order to be more precise

Lexicon/grammar evolution:Lexicon/grammar evolution: updates updates them when:them when:– there are changes in ontologythere are changes in ontology– there are new expressions for already there are new expressions for already

existing concepts/relationsexisting concepts/relations

Page 9: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Support requirements to the system Support requirements to the system (3)(3)

Annotation evolution:Annotation evolution: after changes after changes in the ontology and/or the in the ontology and/or the lexicon/grammar it is necessary to lexicon/grammar it is necessary to update the previously done update the previously done annotationsannotations

In the implementation of these In the implementation of these functionalities we follow the functionalities we follow the requirements for a semantic requirements for a semantic annotation system as they are stated annotation system as they are stated in in Uren et al. 2006Uren et al. 2006

Page 10: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Parameters of semantic annotationParameters of semantic annotation

The ideal prerequisite for semantic The ideal prerequisite for semantic annotation is the interaction among annotation is the interaction among the following three components:the following three components:

Domainontology

Lexicons

Grammars

concepts

terms

link of terms to concept

s

domaintexts

Page 11: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Domain ontologies (3)Domain ontologies (3)We use English as lingua franca (as usual)We use English as lingua franca (as usual)

HOWEVER:HOWEVER:

We rely on the meanings of the conceptsWe rely on the meanings of the concepts

We aim at reconciling the discrepancy between We aim at reconciling the discrepancy between knowledge conceptualization and language knowledge conceptualization and language lexicalizationlexicalization

– If there is no a lexicalized term forIf there is no a lexicalized term for a a concept, concept, thenthen

one of the terms is selected as a nameone of the terms is selected as a name

((ASCIIASCII vs. vs. ASCII code tableASCII code table), or), or

a concept name is constructed as a phrasea concept name is constructed as a phrase

((BarWithButtonsBarWithButtons vs. vs. ToolbarToolbar))

Page 12: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Terminological lexicons (1)Terminological lexicons (1)Lists of the main keywords in a certain Lists of the main keywords in a certain domain domain Free expressions are also allowedFree expressions are also allowed

Example: Example: AlphanumericDisplay [a display that gives the information in the form of characters (numbers or letters)]

In Bulgarian: 9 spelling and lexical variants

буквеноцифров дисплей, буквено-цифров дисплей, символен дисплей, буквеноцифров монитор, буквено-цифров монитор, символен монитор, буквеноцифров екран, буквено-цифров екран, символен екран

Page 13: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Terminological lexicons (2)Terminological lexicons (2)Generalized structure of the LexiconGeneralized structure of the Lexicon(1)(1) a a representativerepresentative term which constitutes the term which constitutes the

meaning for all the term wordings within meaning for all the term wordings within the entry. This term usually ensures the the entry. This term usually ensures the mapping to the relevant concept mapping to the relevant concept

(2)(2) explanation of the concept meaning in explanation of the concept meaning in lingua franca (usually it is English, but in lingua franca (usually it is English, but in fact fact it might be any natural languageit might be any natural language););

(3)(3) a set of terms in a given language that a set of terms in a given language that have the meaning expressed by the leading have the meaning expressed by the leading termterm

Page 14: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

GrammarsGrammars

Two interconnected steps:Two interconnected steps:

(1)(1) concept annotation step (by cascaded concept annotation step (by cascaded regular grammars in CLaRK)regular grammars in CLaRK)

(2)(2) disambiguation step (by constraint disambiguation step (by constraint facilities in CLaRK)facilities in CLaRK)

The quality of the grammar predefines The quality of the grammar predefines the coverage and precision of the the coverage and precision of the annotation, and hence – the efficiency annotation, and hence – the efficiency of the searchof the search

Page 15: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Interaction among modulesInteraction among modules

Ontology LexicalizedTerms

Free Phrases

Grammars

Domain Text

Page 16: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Problematic issues wrt SAProblematic issues wrt SA

Disambiguation is needed of Disambiguation is needed of ambiguous casesambiguous cases(LINK as (LINK as ConnectionConnection and and HyperlinkHyperlink))

Due to the problems of coverage and Due to the problems of coverage and precision of the ontology the precision of the ontology the following operations are also needed:following operations are also needed:– addition, extension, deletion of concepts addition, extension, deletion of concepts

or their correctionor their correction

Page 17: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

CLaRK: architecture and toolsCLaRK: architecture and tools

CLaRK

XML

Regular grammars

Constraints

Editingoperations

Extraction

SortStatistics

XPath Engine

Macro Language

http://www.bultreebank.org/clark/index.html

Page 18: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

The CLaRK System: previous work The CLaRK System: previous work flow architectureflow architecture

Tool preparation phaseTool preparation phase– Writing grammarsWriting grammars– Writing constraints, etcWriting constraints, etc

Document ProcessingDocument Processing– Application of grammars, constraintsApplication of grammars, constraints– User input – selection of constraint User input – selection of constraint

options, selection of grammar options, selection of grammar applicationapplication

Revision of the toolsRevision of the tools

Page 19: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

The CLaRK System: new work flow The CLaRK System: new work flow architecturearchitecture

Tool preparation phaseTool preparation phase– Writing grammarsWriting grammars– Writing constraints, etcWriting constraints, etc

Document ProcessingDocument Processing– Application of grammars, constraintsApplication of grammars, constraints– User input – selection of constraint User input – selection of constraint

options, selection of grammar options, selection of grammar applicationapplication

– Processing-time revision of the toolsProcessing-time revision of the tools

Revision of the toolsRevision of the tools

Page 20: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

ConclusionsConclusionsWe presented an architecture for the We presented an architecture for the semantic annotation of XML documents in semantic annotation of XML documents in a domain from both sides of view - a domain from both sides of view - linguistic adequacy and implementationlinguistic adequacy and implementation

The process of semantic annotation The process of semantic annotation interleaves with ontology / lexicon / interleaves with ontology / lexicon / grammar evolutiongrammar evolution

This way of combining the three tasks This way of combining the three tasks allows the annotation process also to allows the annotation process also to develop from almost completely manual develop from almost completely manual work towards an effective semi-automatic work towards an effective semi-automatic support modulesupport module

Page 21: A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,

Thank you!Thank you!

Ever moving CLaRK Functionalities

User running for better tools