[ieee 2007 international symposium on computational intelligence and intelligent informatics -...

5
Lexical Units in Ontological Semantics Amaal S.H. Al-Hashimy Computer Science Department Sultan Qaboos University Oman, SQU,P.C. 123, P.O.BOX 36 [email protected] Abstract -In this paper we try to focus on the ontological separately, specify the rules for combining word semantic lexicons, one of the static knowledge sources meanings into meanings of sentences and, further, texts; that are used in Ontological Semantic theory for natural hence the division of semantics into lexical (word) language processing. Ontological semantic (OS) is an semantics and compositional (sentence) semantics. approach to developing an exhaustive and detailed Semantics for NLP must also address issues connected linguistic theory of meaning that is sufficient for NLP with the meaning-related activities in both natural (natural language processing) by computers. It is language understanding and generation by a computer. interested in developing all the necessary processing and ialt T&xt AnIyzer knowledge modules and combining them in a _ comprehensive system for a class of real-life NLP .n lo gi, |al Pra rn|nR| applications, such as MT, information extraction, text summarization, question answering,, etc. .It is a knowledge based system that required a vast amount of rr;neX SawIeau information regarding the world around a specific Mrhl and domain of application. R; :;c M; ; Although we review thefundamentals of the approach, the ROnpmacsitr epresentaitors focus is on the knowledge sources structures especially La Semintic the lexicon. The paper Concentrate on some specific con Lexi< aspects that are key to the development of this OS approach, such as the acquisition of lexical units uup I: AXn ai information, and the structure of the lexicon which is one Tei P n Ex ne ci P o er & Of the central resources used in it. And some modern MT,pa Q~ M App ItlmobrIb ngiesc issues like the possibility of automation of static knowledge acquisition. KnowlatM& Suppor -*_ Data Raw or drcesstng uc A mben}tAfio o f Sriortc Keywords: Ontological semantic theory, lexicon, lexical I' P Fact Repos iory KrIaviedge SourceS acquisition, automatic acquisition. Figure 1 Overall Architecture of a Generic Application of Ontological Semantics 1 Ontological Semantics So the meaning representation of a text is derived The goal of ontological semantics is the extraction through: representation and manipulation of meaning in natural 1. establishing the lexical meanings of individual words language texts with a view toward supporting applications and phrases comprising the text; such as MT or question answering. Text meaning is 2. disambiguating these meanings; represented in text meaning representations (TMRs) that 3 combining these meanings into a semantic dependency are derived compositionally, primarily from meanings of structure SDS. words and phrases in the text, where word and phrase meaning is encoded in the ontological-semantic lexicon T g p g (see Figure 1) . Central to this goal is the employment of units, as defined through links to the ontology and by the ontology, or a constructed model of the world, as a non-propositional meaning elements; so the process is language-independent static resource, which is used to guided by the syntax-semantics interface manifested in construct text meaning representation TMR of the input the lexical syntactic and lexical semantic specification of texts. lexical entries. Once the tokens for processing are established, a parser 1.2 main processing in os begins to turn the text into TMR (a computational * ~~~~~~~~~~~~~~representation of the text meaning). Words are looked up As any semantic theory for natural language processing, in the lexicon and onomasticon, or, in case some concept OS must account for the processes of generating and is referenced indirectly, in the fact database. The parsing manipulating text meaning. An accepted general method is a recursive process: in cases, where it is impossible to of doing this is to describe the meanings of words and, find a matching lexical entry, the restrictions on the 1-4244-11 58-0/07/$25.00 © 2007 IEEE. 233

Upload: amaal-s-h

Post on 18-Mar-2017

220 views

Category:

Documents


2 download

TRANSCRIPT

Lexical Units in Ontological Semantics

Amaal S.H. Al-HashimyComputer Science Department

Sultan Qaboos UniversityOman, SQU,P.C. 123, P.O.BOX 36

[email protected]

Abstract -In this paper we try to focus on the ontological separately, specify the rules for combining wordsemantic lexicons, one of the static knowledge sources meanings into meanings of sentences and, further, texts;that are used in Ontological Semantic theory for natural hence the division of semantics into lexical (word)language processing. Ontological semantic (OS) is an semantics and compositional (sentence) semantics.approach to developing an exhaustive and detailed Semantics for NLP must also address issues connectedlinguistic theory of meaning that is sufficient for NLP with the meaning-related activities in both natural(natural language processing) by computers. It is language understanding and generation by a computer.interested in developing all the necessary processing and ialtT&xt AnIyzerknowledge modules and combining them in a _comprehensive system for a class of real-life NLP .n logi, |al Prarn|nR|applications, such as MT, information extraction, textsummarization, question answering,, etc. .It is aknowledge based system that required a vast amount of rr;neX SawIeauinformation regarding the world around a specific Mrhl anddomain ofapplication. R;:;c M; ;Although we review thefundamentals ofthe approach, the ROnpmacsitr epresentaitorsfocus is on the knowledge sources structures especially La Seminticthe lexicon. The paper Concentrate on some specific conLexi<aspects that are key to the development of this OSapproach, such as the acquisition of lexical units uup I: AXn aiinformation, and the structure of the lexicon which is one Tei P n Ex ne ci P o er &Of the central resources used in it. And some modern MT,paQ~ M AppItlmobrIb ngiescissues like the possibility of automation of staticknowledge acquisition. KnowlatM& Suppor-*_ Data Raw or drcesstng ucA mben}tAfio o f SriortcKeywords: Ontological semantic theory, lexicon, lexical I' P Fact Repos iory KrIaviedge SourceS

acquisition, automatic acquisition. Figure 1 Overall Architecture of a Generic Application ofOntological Semantics

1 Ontological Semantics So the meaning representation of a text is derived

The goal of ontological semantics is the extraction through:representation and manipulation of meaning in natural 1. establishing the lexical meanings of individual wordslanguage texts with a view toward supporting applications and phrases comprising the text;such as MT or question answering. Text meaning is 2. disambiguating these meanings;represented in text meaning representations (TMRs) that 3 combining these meanings into a semantic dependencyare derived compositionally, primarily from meanings of structure SDS.words and phrases in the text, where word and phrasemeaning is encoded in the ontological-semantic lexicon T g p g(see Figure 1) . Central to this goal is the employment of units, as defined through links to the ontology and bythe ontology, or a constructed model of the world, as a non-propositional meaning elements; so the process islanguage-independent static resource, which is used to guided by the syntax-semantics interface manifested inconstruct text meaning representation TMR of the input the lexical syntactic and lexical semantic specification oftexts. lexical entries.

Once the tokens for processing are established, a parser1.2 main processing in os begins to turn the text into TMR (a computational* ~~~~~~~~~~~~~~representation of the text meaning). Words are looked up

As any semantic theory for natural language processing, in the lexicon and onomasticon, or, in case some conceptOS must account for the processes of generating and is referenced indirectly, in the fact database. The parsingmanipulating text meaning. An accepted general method is a recursive process: in cases, where it is impossible toof doing this is to describe the meanings of words and, find a matching lexical entry, the restrictions on the

1-4244-11 58-0/07/$25.00 © 2007 IEEE.233

3rd International Symposium on Computational Intelligence and Intelligent Informatics - ISCIII 2007 - Agadir, Morocco * March 28-30, 2007

conceptual connections are relaxed, and the process is 1. General: word class, definition, example, comments,then repeated. As a result, TMRs largely consist of variants.instances of ontological concepts. Some of these instances 2. Syntax. syntactic dependency.are remembered (as "facts") and stored in the fact 3. Semantics: lexical semantics, meaning representation.repository, FR, a knowledge base of remembered 4. Linking: case roles.ontological instances. Some facts in the fact repository arereferred to by proper names in texts-personal names, The following scheme, in a BNF-like notation,toponyms, names of organizations, specific artifacts ("the summarizes the basic lexicon structure. (see Figure 2).statue of Liberty"), etc. These proper names are stored inthe onomasticon, the semantic zones of whose entries n 1contain a pointer to a corresponding fact repository CATEGORY {n-dat)element. Once the TMR for the document is acquired, it ORTOGRAPHY:can be used for a number of purposes from translation and VARINS.|vriantblinformation extraction and data mining. ABBREVIATIONS: |Ab|

1.3 Knowledge sources in OS MRRGE11-: {1formThe methodology of ontological semantics consists of {irreg-torr-ae)acquisition of the static knowledge sources (ontology, RADIGM. {PAradigm-nam}lexicon, and fact database) and of the procedures for N TVARIANTS: frm" {producing and manipulating TMRs. DEFINITIO 1d111nin in NL1 1An implemented system of OS employs the following EMPLES. 'bexa1e11resources: _ 1 416idcOgaper cbrent"1) The ontology, which is language-independent, (a TIME_-STMP {l1xicog-id d&tebotbentYr*

constructed model of the world) SYNTCT|C-|FEATURES: (f8atue valub)|2) Lexicons of the languages that the system has to be SYi11TT1-STRUCTURE.1-structure

working with (which are connected to the ontology); SEMAIC-STRUCTURE: lexI-sr-specification3) Fact database and onomasticon, or depository of proper Figure 2. Lexicon entry

names, which contain instantiations of ontologicalconcepts; The contents of the SYN-STRUC zone of a lexicon entry

4) Text processing modules, most prominently a are an indication of how the lexeme fits into parses ofsemantics text analyzer (which is intended for sentences. In addition, this zone provides the basis of theconstructing text meaning representations from syntax-semantics interface. Thus a brief specification ofnatural language texts) and semantic text generator this zone is necessary to present the foundation of the(intended for a reverse process, constructing natural semantic analysis process, which relies on the syntax-language texts on the basis of text meaning semantics interface as one of the dynamic knowledgerepresentations). sources used in constructing a semantic representation

(i.e., the TMR) for input text.2 Lexicons in OSNatural language processing (NLP) systems vary in their 2.3 Lexical semantic specificationsgoals, and as such vary in what they require from the The lexical semantic specification found in each entry inlexicon. The computational lexicon is the fundamental the lexicon is the repository of low-level semanticrepository of information about the primary component of information. The syntax-semantics interface links intolanguage, i.e. words, and therefore critical for systems that specification, guiding the search process bywhich aim to handle some aspect of natural language. suggesting what element is a candidate for combinationTwo key issues for the lexicon in NLP tasks are lexical with what other element, and in what relation.representation and lexical acquisition. The base case of this specification is an indication thatThe ontological semantic lexicon specifies what concept, the word refers to a concept from the ontology, and in theconcepts, property or properties of concepts defined in the process of semantic analysis, the word would result in anontology must be instantiated in the TMR to account for instantiation of that concept. In many cases that conceptthe meaning of a particular lexical unit of input, has further constraints on the allowable fillers for various

slots or specific values filled in for literal (non-relational)2.2bexical syntactic specifiHcations slots. Some lexical semantic specifications include

multiple concepts to be instantiated in a particularEach lexicon entry is comprised of a number of sections structure (i.e., one instantiation will be specified to be thecorresponding to the various types of lexical information, head, and another as a filler of a particular slot). Other

lexical semantic specifications might not invoke the

234

A. S. H. Al-Hashimy Lexical Units in Ontological Semantics

instantiation of a concept, but just provide filler was due to the necessity to acquire all knowledgeinformation for another concept (the adjective blue, for manually, using expensive expert-trained humanexample) or relate two other concepts to be instantiated acquirers.by other words. Interwoven with these semantic The steps of lexical acquisition may be presented asspecifications is the syntax-semantics interface follows:component. Particular slots in the specification may have 1. polysemy reduction: decide how many senses for everya reference variable as the filler; the variable is bound to a word must be included into a lexicon entry: read theheaded syntactic structure during processing, and the definitions of every word sense in a dictionary and tryinstantiated concepts that result from the semantic to merge as many senses as possible, so that aprocessing of that syntactic structure are inserted into the minimum number of senses remains;indicated slot's value. For example, in the specification 2. syntactic description: describe the syntax of everyfor eat, the concept EAT may be called for, and the A sense ofthe word;GENT slot of that concept may dereference the syntactic 3. ontological matching: describe the semantics of everysubject head; thus the resulting construction after the word sense by mapping it into an ontological concept, aSDS-building process would result in an instantiated EAT property, a parameter value or any combination thereof;concept, with its AGENT slot filled by an instantiation 4. adjusting lexical constraints: constrain the properties ofwhich refers to the eater. the concept property or parameter, if necessary;

5. linking: link syntactic and semantic properties of a2.4 mapping lexical syntactic-semantic word sense.information

One of the first and most important tools for acquirers is aclearly stated set of terms and accompanying definitions

The SDS building process is guided by the syntax- relevant to acquisition, centrally, the specification of thesemantics interface manifested in the lexical syntactic and formats and of the semantics of the knowledge sources.lexical semantic specification of lexical entries. So, one of Another, but extremely important tool for linguists hopingthe most key decisions in developing knowledgebase in to successfully acquire concepts and lexical items in anyOS is the specifications of ontological concept(s) for particular domain, is a dictionary specific to the domainlexical entries. area. For example, when acquiring in the medical domain,Several mapping methods required according to different researchers should use a medical dictionary, in the legallexical cases:- domain, a law dictionary, etc. Dictionaries are not only* direct mapping: when the semantics of the sense is useful in providing definitions for humans in the

fully described by a concept. ontology, housed on a centralized acquisition tool such as* modified: when no concept exactly matches the Purdue University's KBAE, but also in polysemy

semantics of a sense, then take the closest in meaning reduction, one of the major areas of focus for masterand then modify some of its properties to construct a acquirers.complex knowledge structure that quite accurately Once it is there the development of a toolkit forreflects the meaning of this sense. acquisition can start. The toolkit includes acquisition

interfaces, statistical corpus processing tools, a set of textModified mappings are a powerful method for avoiding corpora, a set of machine-readable dictionaries (MRDs), athe proliferation of concepts in the ontology, the suite of pedagogical tools (knowledge sourcedrawback being not only increased processing load, but descriptions, an acquisition tutorial, a help facility) and aalso considerable acquisition work, since we have not yet database management system to maintain the datafound a way to automate it, unlike direct mapping, which acquired (see Figure3).can partially be automated. User

3 Lexicon acquisition DB.Acquisition is the lifeblood of ontological semantics. To'Through the acquisition process, trained acquirers Tdescribe the ontological backbone to this natural languageprocessing approach. But, the acquisition process can be.difficult, redundant, and extremely time-consuming, f..,leading to a variety of errors and an ultimate slow-downof an already length1y and difcuzllUt human effort. ,C,And like all Knowledge-based applications which involveNLP carried the stigma of being too expensive to develop, Figure 3. An ontological semantics acquisition toolkitdifficult to reuse as well as incapable of processing abroad range of inputs and this high price of development

235

3rd International Symposium on Computational Intelligence and Intelligent Informatics - ISCIII 2007 - Agadir, Morocco * March 28-30, 2007

The various methodologies developed for acquisition of modifies the local syntactic information and the lexicalthe static knowledge sources (the ontology, monolingual semantic specification (or at least the syntax- semanticlexicons, onomasticons, and fact database) of ontological interface).semantics necessarily involve, at this stage ofdevelopment, considerable human participation, although Inheritance is one type of cross-indexing used, forthe aim is to fully automate all processes involved in the example, to indicate that a particular lexeme is ofapproach, both in terms of acquisition and runtime syntactic class, thus avoiding the need for a syntacticprocedures. But this is till now unfortunately unavailable, specification or syntactic features to be specified locallyalthough OS authors claim that OS in its current state in the corresponding entry: the information will beutilizes every possible automation that could be done with inherited from the specification in the definition of thein the current environment of acquisition, but all this done class. The same way can be used to inheret the semanticunder the control of human acquirer. features of a set of lexemes in the same class. But the

problem here is that how to cluster these classes and3.1 Automatic lexicon acquisition according to what criteria. Using a corpus of specific

domain and a very well constructed domain specificSince the lexicon is the main concern here, it will be dictionary besides the ontology the lexemes can beworth it if we investigate the possible methods of clustered into a set of classes according to their semanticautomation that can be used for its acquisition. features and hence they can beThe principle of complete coverage, to which ontological The assigning process of the concepts to a lexeme (whichsemantics is committed means that every sense of every is the difficult step in lexeme acquisition) is donelexical item should receive a lexical entry . "Every" in manually. And it is very crucial matter since it guides thethis context means every word or phrase sense in a corpus SDS creation. As mentioned before, this can be doneon which an application is based. through direct and modified mapping. Depending on theWhen acquiring a lexical entry, the most difficult part of semantic features of a lexemes mentioned in a dictionarythe work is determining what concept(s) to use as the beside a specific domain corpus and an inventory of casebasis for the specification of the meaning of a lexical unit; roles assigned to the syntactic structures of the lexemes,the moment such a decision is made, the nature of the the semantic image of the lexeme can be reflected andwork becomes essentially determining which of the matched against the ontology concepts to suite a no ofattributes values of a lexica entry to modify to fit the concepts that are most appropriate to the lexeme in hand.meaning of the lexeme. Then depending on matching of syntactic and semanticThe acquisition of lexical entries suitable for a specific image a filtration process can be made. This will notdomain can be done semi automatically through the use of assure a successful indication of the right concept exactlyset of techniques like rapid propagation and lexical rules. but will limit the no of attested concepts to find the* Rapid Propagation: The procedure for its suitable concept(s).

implementation involves having a "master acquirer" This will not fully automate the operation but at least partproduce a single sample entry for each class of of it.lexemes, such that the remainder of the acquisitionwork will involve copying the "seed" entry and 4 Conclusionsmodifying it, often very slightly. But some classes willbe relatively small (might be classes of one), however, . OS is a knowledge based system that required a vastthis observation does not refute the obvious benefit of amount of information regarding the world around ausing a ready made template for speedy and uniform specific domain of application.acquisition of items in a class. And some such classes . OS is evolving continuously as new systemsare quite large. One example of a large lexical class implementing it in response to needs for enhanced(over 250 members) whose acquisition can be rapidly coverage and utility. Historically, a number of researchpropagated is that of the English adjectives of size. projects have contributed to bring OS into its current

* Lexical Rules: It finds economies in automatic state. These projects aimed at producing robust large-propagation of lexicon entries on the basis of scale natural language processing systems to be used insystematic relationships between classes of lexical machine translation, information retrieval andentries, e.g., between verbs, such as abhor and extraction, text summarization.corresponding deverbal adjective abhorrent. LRs . ontological semantics would tend to produceconsist of a left-hand side (LHS) which constrains the complicated entries in the lexicon rather than in thelexical entries to which the rule can apply and a right- ontology, and to this effect it provides lexiconhand side (RHS5) which stipulates how the new lexical aqiiinwt oeepesv en

entr wil dffefrm te oigial.Lexcalentieswhih .fully automation is undesirable in the current state ofare produced by a LR are themselves eligible to match art for OS, but the theory implies a good level ofthe LHS of an LR. Both sides of the LR can reference automation in knowledge acquisition as this processany zone of the lexical entry; typically the RUS

236

A. S. H. Al-Hashimy Lexical Units in Ontological Semantics

usually is the most difficult in knowledge basedsystems.

* lexicons in OS needs to be more considered to moreformal representations to be eligible for moreautomation processes.

References[1] A. M. Ortiz, V. Raskin, and S. Nirenburg, "Newdevelopments in Ontological Semantics".

[2] B. A. Onyshkerych, "An ontological semanticframework for text analysis", Ph.D. thesis ,Pittsburgh,CMU-LT1 -97-148, Carneyie Mellon University, pa15213, May 1997.

[3] E. Malaia, "Digital identity management inontological semantics: methodology and practice ofdomain", Ph.D. thesis, CERIAS Tech Report, PurdueUniversity, USA, 2005.

[4] J. Spartz, E. Malaia, and C. Falk, "Methodology andtools for ontological semantic acguisition, CERIAS,Purdue Unv., USA, 2005.

[5] S. Nirenburg, and V. Raskin, "Ontological semantics",MIT Press, Cambirdage, MA, 2004.

[6] S. Nirenburg, S. Beale, and M. Mcshane,"Evaluating the performance of the Ontosem semanticanalyzer", Proceedings of the ACL Workshop on TextMeaning Representation, 2004.

[7] S. Nirenburg, M. Mcshare, and S. Beale, "operativestrategies in ontological semantics", Proceedings ofHLT_NAACL-03 workshop on text meaning, Edmonton,Alberta, Canada, June,2003.

[8] M. Mcshare, S. Beale, and S. Nirenburg, "somemeaning procedures of Ontological Semantics",Proceedings ofLREC, 2004.

[9] V. Raskin, K. Triezenberg, E. Malaia, and 0.Kranchina, "Ontological Semantic support for a specificdomain "CERIAS, Purdue University, USA, 2005.

237