report on computational lexicon working group on multilingual lexicon eu -wg meeting december 1 st...

29
REPORT on REPORT on Computational Lexicon Working Computational Lexicon Working Group Group on Multilingual Lexicon on Multilingual Lexicon EU -WG Meeting December 1 st -2 nd 2000 Pisa UPenn, December 11 2000

Upload: sylvia-rice

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

REPORT onREPORT onComputational Lexicon Working Computational Lexicon Working

GroupGroup

on Multilingual Lexiconon Multilingual Lexicon EU -WG Meeting

December 1st-2nd 2000Pisa

UPenn, December 11 2000

The The MultilingualMultilingual ISLE ISLE Lexical EntryLexical Entry (MILE)(MILE)

General methodological principles (from EAGLES):

1. Basic requirements for the MILE:MILE:

Modular and layeredModular and layered

GranularGranular

Allow for underspecificationunderspecification

ISLE should discover and list (the maximal set of) basic notionsbasic notions to be included in the MILE

The leading principle for the design of the MILE should be the edited unionedited union of existing lexicons / models (redundancyredundancy should not be a problem)

MILEMILE

3.3. ObjectiveObjective: definition of definition of MILEMILE, ,

its basic notions, its basic notions,

architecture, architecture,

3.3. such that we can write a DTDsuch that we can write a DTD

& have a tool to support it& have a tool to support it

discover a methodology of workmethodology of work towards this

Some advantages:Flexibility of representation

Easy to customise and update

Easy integration of existing resources

High versatility towards different applications

ModularityModularity at least under three respects:

in the macrostructuremacrostructure and general general architecturearchitecture of the MILE

in the microstructuremicrostructure of the MILE

in the specific microstructure of the MILE word-senseword-sense

Modularity in MILEModularity in MILE

Modularity in MILEModularity in MILE

A. Modularity in the macrostructure and macrostructure and general architecturegeneral architecture of the MILE

1.1. Meta-informationMeta-information - versioning of the lexicon, languages, updates, status, project, origin, etc. (see e.g. OLIF, GENELEX)

2.2. Possible architecture(s) of multilingual Possible architecture(s) of multilingual lexicon(s)lexicon(s) - interactions of the different modules within the general structure. Issues related to transfer-based, interlingua-based approaches, and hybrid solutions.

Modularity in MILEModularity in MILE

B. Modularity in the microstructure microstructure of the MILE – The MILE could be organized in at least the following modules:

1.1. Monolingual linguistic representationMonolingual linguistic representation

2.2. Collocational informationCollocational information

3.3. Multilingual apparatus (e.g. transfer Multilingual apparatus (e.g. transfer conditions and actions)conditions and actions)

Monolingual Linguistic Monolingual Linguistic RepresentationRepresentation

• It includes the morphosyntactic, syntactic, and semantic information characterizing the MILE in a certain source language.

• It possibly corresponds to the typology of information contained in existing lexicons, such as PAROLE-SIMPLE, (Euro)WordNet (EWN), COMLEX, FrameNet, etc.

Monolingual Linguistic Monolingual Linguistic Representation: Representation:

a Provisional Lista Provisional List

Morphological layer

• Grammatical category and subcategory

• Gender, number, person, mood

• Inflectional class

• Modifications of the lemma

• Mass/count, 'pluralia tantum'

• …

Monolingual Linguistic Monolingual Linguistic Representation: Representation:

a Provisional Lista Provisional List

Syntactic layer

• Idiosyncratic behaviour with respect to specific syntactic rules (passivisation, middle, etc.)

• Attributive vs. predicative function, gradability

• List of syntactic positions forming subcategorization frames

• Syntactic constraints and properties of the possible 'slot filler'

• Morphosyntactic and/or lexical features (agreement, auxiliary, prepositions and particles introducing clausal complements)

• Information on control (subject control, object control, etc.) and raising properties

• …

Monolingual Linguistic Monolingual Linguistic Representation: Representation:

a Provisional Lista Provisional ListSemantic layer

• Characterization of senses through links to an Ontology

• Domain information, gloss

• Argument structure, semantic roles, selectional preferences

• Event type for verbs, to characterize their actionality behaviour

• Link to the syntactic realization of the arguments

• Basic semantic relations between word senses (synonymy / synset, hyponymy, meronymy, etc.)

• Semantic/world-knowledge relations among word senses (such as EWN relations and SIMPLE Qualia Structure)

• Information about regular polisemous alternation

• Information concerning cross-part of speech relations• ….

Collocational InformationCollocational Information

More or less typical and/or fixed syntactic-semantic patterns

• Typical or idiosyncratic syntactic constructions

• Typical collocates

• Support verb construction

• Phraseological or multiwords constructions

• Compounds (e.g. noun-noun, noun-PP, adjective noun, etc.)

• Corpus-driven examples of MILE

• …

Multilingual ApparatusMultilingual Apparatus

Transfer conditions and actions

• possible starting points: OLIF, GENELEX, etc.

• devise possible cases of problematic transfer (cf. e.g. the list of linguistic phenomena circulated)

• identify which conditions must be expressible and which transformation actions are necessary

• select which types of information these conditions must access

• examine the variability in granularity needed when translating in different languages, and the architectural implications of this

• which role for an Interlingua?

Modularity in MILEModularity in MILE

C. Modularity in the specific microstructure ofmicrostructure of the MILE word-senseword-sense

Word-senses are the basic units at the multilingual level

Senses should also have a modular structure

Coarse-grained (general purpose) characterisation in terms of prototypical properties, captured by the formal means in (B.1)

Fine-grained (domain or text dependent) characterisation mostly in terms of collocational/syntagmatic properties (B.2) (particularly useful for specific tasks, such as WSD and translation)

MILEMILE

A. MILE MacrostructureA. MILE Macrostructure

Meta-informationMeta-information

ArchitectureArchitecture

B. MILE MicrostructureB. MILE Microstructure

1. Monolingual1. Monolingual 2. Collocational2. Collocational 3. Multilingual3. Multilingual

C. Word-Sense C. Word-Sense MicrostructureMicrostructure

1. Coarse-grained1. Coarse-grained

2. Fine-grained2. Fine-grained

Monolingual Linguistic Monolingual Linguistic RepresentationRepresentation

A strategy:• consider as the starting point for MILE the edited unionedited union of

the basic notions represented in the existing syntactic/semantic lexicons (their models)

• evaluate their notions wrt EAGLESEAGLES recommendations for syntax and semantics

• evaluate their usefulness & adequacyusefulness & adequacy for multilingual tasks

• evaluate integrabilityintegrability of their notions in a unitary MILE

• look for deficient areasdeficient areas.

To be decided: should ISLE reach a consensus at the level of the “types” of information only, or also at the level of their “token” values?

Open issues:

• what is relevant

• what can be generalised and formally characterised

• what must be simply listed (but even lists may be partially categorised)

• what type of representation and analysis to be provided of these phenomena (e.g. a Mel'cuk style analysis for support verb constructions, FrameNet style description of syntactic-semantic “constructions”, etc.)

Collocational InformationCollocational Information

Agreed PrinciplesAgreed Principles

MILEMILE incorporates previous recommendations:

is the “complete” entryis the “complete” entry

(to be evaluated wrt usefulness & adequacy for multilingual tasks)

MILEMILE builds on the monolingual entry & builds on the monolingual entry & expands itexpands it

(at least) with an additional module where correspondences betw. languages are defined

We consider 2 broad categories of applicationscategories of applications

translation

CLIR (linking module may be simpler)

(label info types wrt application)

Clues in dictionaries to decide on target equivalent

Guidelines for lexicographers

Clues (to disambiguate/translate) in corpus concordances

Lexical requirements from various types of transfer conditions and actions in MT systems

Lexical requirements from interlingua-based systems

Examined guidelines for bilingual dictionaries provided by SA

Paths to discover Basic Paths to discover Basic Notions of MILENotions of MILE

For all the notionsnotions:

notion already in previous workin previous work (Eagles/ Parole/ Simple/ EWN/ Comlex/ Framenet/…)

evaluate if the existing specs are adequate

draw a list of “not yet recommended/adoptednot yet recommended/adopted” notions:

method of work

priorities

for which applications

assign tasks

need of further development

Classification of Basic Classification of Basic Notions of MILENotions of MILE

Organisational ProposalOrganisational Proposal

Start from available EAGLESEAGLES recommendations, e.g. as instantiated in Parole/Simple

adopt as starting point the P/SP/S DTD, DTD,

to be revised & augmented

see Barcelona tool

Evaluate if we can combine

in a “hybrid super-model”“hybrid super-model”

the transfer & interlingua approaches

1. Select a list of critical critical information typesinformation types that will compose each module of the MILE

2. Start an in-depth analysis of eachin-depth analysis of each of these areas aiming at identifying:

The most stable solutions adopted in the community

Linguistic specifications and criteria

Possible representational solutions, their compatibility, etc.

An evaluation of their respective weight/importance in a multilingual lexicon (towards a layered approach to recommendations)

Identify the open issues and the current boundaries of the state of the art (which cannot be standardised yet)

…..

Organisational ProposalOrganisational Proposal

The tasks should lead to:The tasks should lead to:

Information Types

1. How to represent it (e.g. frames, a selection of theta-roles, e.g.)

2. Typology of arguments3. Representational problems4. Applicative constraints and needs5. Linking with syntax (how to express it)6. Open issues

Argument structureArgument structure

1. Typology (e.g. hyponymy, meronymy, etc.)2. Available tests3. Representational format(s)4. Applicative constraints and needs5. Expressive limits6. Open issues

Semantic relationsSemantic relations

1. Types of modifiers2. Representational issues3. Open issues

Modification relationsModification relations

1. Typology2. How to represent the “internal” structure

of MWEs (e.g. Mel’cuk relations, etc.)3. Encoding criteria4. Application needs and biases 5. Open issues

MultiWords MultiWords ExpressionsExpressions

1. How to represent them (e.g. features, reference to an ontology, word-senses, etc.)

2. Different status of the preferences3. Criteria to identify them4. Expressive limits of existing formal

resources

Selectional Selectional preferencespreferences

Information Types

1. Identification of categories of transfer phenomena

2. Ranking of hard cases3. Possible parameterisation wrt language

types4. How to formalise them5. Types of actions

Transfer conditions Transfer conditions and actionsand actions

1. Architectural issues (types of ontologies: e.g. taxonomies, “Qualia”-based type systems, etc.)

2. Inheritance3. Which roles for ontologies in the MILE4. Representational issues5. Customisation and development criteria6. Limits

OntologyOntology

1. Typology2. How to represent them3. Interaction with selectional preferences

Collocational Collocational PatternsPatterns

Information Types

Organisational ProposalOrganisational Proposal

Highlighted some hot issues & assigned tasks:

sense indicators (Issco)

selection preferences (Thurmair)

argument structure (US?….)

MWE (Pisa)

modifiers (Jock)

semantic relations (Piek?)

transfer conditions (…)

collocational patterns (…)

ontology (…)

….

Organisational Proposal

Ask to AmericansAmericans, e.g.:

evaluate existing EAGLES etc. recommendations wrt usefulness, coverage, adequacy,…

analyse some of the above info types

look at other languages (Japanese, Chinese, Korean, …) for transfer conditions

look at transfer-based MT systems

look at interlingua MT systems (e.g. Mikrokosmos): additional info types?

Meeting Meeting together US & EUUS & EU, e.g. end February, beg. March?

DIET Tool

From ISSCO:

for text annotation (of test suites for semantic annotation)

to be used for evaluation purposes

….

...

Survey:Survey:

List of Received MaterialsList of Received Materials

Comparison table Linguistic phenomena

Collins, Hachette-Oxford

Yes Yes

Van Dale Lexicons Yes No

FrameNet Yes No

Collins-Robert lexical-semantic db

Yes No

PAROLE-Simple Yes Yes

EuroWordNet Yes Yes

Eurotra Yes Yes

OLIF No No

Genelex No No

EDR No No

Others Surveys ExpectedOthers Surveys Expected

Surveys from USUS?

• Microsoft• IBM• CMU• NMSU• ISI• Systran• Logos