linguistic specifications penn, december 11 2000

26
Linguistic Linguistic Specifications Specifications Penn, December 11 2000

Post on 22-Dec-2015

220 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Linguistic Specifications Penn, December 11 2000

Linguistic SpecificationsLinguistic SpecificationsPenn, December 11 2000

Page 2: Linguistic Specifications Penn, December 11 2000

What is SIMPLE?What is SIMPLE?

use of a common modelcommon model

use of a common representation languagecommon representation language

use of a common methodology of buildingcommon methodology of building the lexicon

common Template Typescommon Template Types, with default obligatory info (Type defining), and indication of optional info

A subset of the 12 Lexicons crosslingually related:A subset of the 12 Lexicons crosslingually related:

choice of a shared set of SemUs (from EWN)

A set of harmonised computational lexicons A set of harmonised computational lexicons for HLT applications, for HLT applications,

geared for multilingual linksgeared for multilingual links

Page 3: Linguistic Specifications Penn, December 11 2000

SemU

MuSSynU

SemU

Sem InfoSem Info

Lexical RelSem. Rel Sem. Feat

MuSSynU

SemUSemU

Sem InfoSem Info

PAROLE – SIMPLE PAROLE – SIMPLE ArchitectureArchitecture

TEMPLATE

Page 4: Linguistic Specifications Penn, December 11 2000

Semantic information in SIMPLESemantic information in SIMPLE

Word senses are encoded as Semantic UnitsSemantic Units (SemUs),(SemUs), containing the following information:

• Semantic type *Semantic type *

• Domain *Domain *

• Lexicographic gloss *Lexicographic gloss *

• Qualia structure

• Reg. Polysemy altern.

• Event type

• Derivation relations

• Synonymy

• Collocations

• Argument structure for Argument structure for predicative SemUs *predicative SemUs *

• Selection restrictions on the Selection restrictions on the arguments *arguments *

• Link of the arguments to the Link of the arguments to the syntactic subcategorization syntactic subcategorization frames (represented in the frames (represented in the PAROLE lexicons) *PAROLE lexicons) *

Page 5: Linguistic Specifications Penn, December 11 2000

Some research aspects of the modelSome research aspects of the model

On a large scale, for so many languages:

multiple orthogonal dimensions of meanings (GL) multiple orthogonal dimensions of meanings (GL) for different POS, e.g.:

qualia rolesqualia roles, made up by various semantic relations/features (also from Genelex & Acquilex, but reorganised in a coherent structure): the extended qualia structureextended qualia structure

argument structure & selection preferencesargument structure & selection preferences, linked to the PAROLE syntactic frame

Providing a framework for testing and evaluating the maturity of framework for testing and evaluating the maturity of the current state-of-the-art in lexical semanticsthe current state-of-the-art in lexical semantics

Potential basis for future European multilingual initiatives for HLT applications

Page 6: Linguistic Specifications Penn, December 11 2000

Semantic Multidimensionality and NLPSemantic Multidimensionality and NLP

Crucial NLP tasks (IE, WSD, NP Recognition, etc.) need to access multidimensional aspects of word meaning, represented in SIMPLE with the

Qualia RelationsQualia RelationsIs_a_part_ofIs_a_part_of

Member_ofMember_of

TelicTelic

Made_ofMade_of

la pagina del libro (the page of the book)

il difensore della Juventus (Juventus fullback)

il suonatore di liuto (the lute player)

il tavolo di legno (the wooden table)

Page 7: Linguistic Specifications Penn, December 11 2000

Complexity? Complexity? a constraining, structured model is a constraining, structured model is

necessarynecessary

to enforce uniformity betw. languagesuniformity betw. languages & systematicity in encodingsystematicity in encoding

Great granularity and details in the specs (wrt the TA) implied: more work for the Specs Groupmore work for the Specs Group... a common methodology for the lexicographersa common methodology for the lexicographers, guided by

the Templates (also less waist of time)

TemplatesTemplates as a way to organise and classify relevant “clusters” of “clusters” of informationinformation for coherent encoding, across sites and languages (distributed

building of harmonised lexicons) for later use/tuning of the information in applications and tasks

Page 8: Linguistic Specifications Penn, December 11 2000

SemUSemU Predicate, arguments, Predicate, arguments, Selection restrictionsSelection restrictions

Pred. LayerPred. Layer

QualiaQualia DerivationDerivation PolysemyPolysemy Event TypeEvent Type

InstantiationInstantiation

Italian lexiconItalian lexicon

Type Type OntologyOntology

TemplateTemplate Catalan lexiconCatalan lexicon

Danish lexiconDanish lexicon

Greek lexiconGreek lexicon

Overall OrganizationOverall Organization

Page 9: Linguistic Specifications Penn, December 11 2000

TemplateTemplate for for

Semantic Semantic UnitsUnits

Conextual/Conextual/Polysemy Polysemy

InformationInformation

Qualia Qualia StructureStructure

Predicative Predicative LayerLayer

Type System Type System CoordinatesCoordinates

SemU: Identifier of a SemUSynU: Identifier of the SynU to which the SemU is linkedBC Number: Number of the corresponding Base Concept in

EuroWordNetTemplate_Type: Semantic type of the SemUTemplate_Supertype: Semantic type which dominates the type of the SemU in the

type-hierarchyUnification_path: Unification history of a template (only for unified top-types)Domain: Domain information from ERLI's domain listSemantic Class: One of WordNet Classes used by ERLIGlossa: Lexicographic definitionEvent Type: Event SortPredicativeRepresentation:

Predicate associated with the SemU, and its argumentstructure

Selectional Restr.: Selectional restrictions on the argumentsDerivation: Derivational relations between SemUsFormal: Formal relation between SemUsAgentive: Agentive relations between SemUsConstitutive: Constitutive relations between SemUs

Constitutive semantic featuresTelic: Telic relations between SemUsSynonymy: Synonyms of the SemUCollocates: Collocate informationComplex: Polysemous class of the SemU

““redundancy”redundancy”

Page 10: Linguistic Specifications Penn, December 11 2000

Perception

Verb Examples: hear, smell, etc.

Noun Examples: sight, look, etc.

Linguistic Tests:Linguistic Tests:

Levin Class:Levin Class: 30.1 (See verb, e.g. detect, see, notice), 30.4 (Stimulus subject, e.g. look, smell)

Comments: Processes involving an experiencing relation, whereby the perception involves the senses of a living entity. The instrument of perception (e.g. eyes for see is encoded in the Constitutive quale).

Under this template we include both volitional (e.g. look) and non-volitional (e.g. see) events. The difference is expressed as a constitutive feature.

Page 11: Linguistic Specifications Penn, December 11 2000

Template for PerceptionTemplate for PerceptionSemU: 1Usyn:BC Number: 105Template_Type: [Perception]Template_Supertype:[Psychological_event]Domain: GeneralSemantic Class: PerceptionGloss: //free//Event type: processPred _Rep.: Lex_Pred (<arg0>,<arg1>)Derivation: <Nil> or //Erli's Code//Selectional Restr.:arg0 = Animate //concept// arg1:default = [Entity] Formal: isa (1,<SemU>:[Perception]>)Agentive: <Nil>Constitutive: instrument (1, <SemU>:[Body_part]) intentionality ={yes,no} //optional//Telic: <Nil>Collocates: Collocates (<SemU1>,...<SemUn>)Complex: <Nil>

Page 12: Linguistic Specifications Penn, December 11 2000

Example

SemU: <guardare_2> //look_2//

Usyn:

BC Number: 105

Template_Type: [Perception]

Template_Supertype:[Psychological_event]

Domain: General

Semantic Class: Perception

Gloss: osservare con attenzione

Event type: process

Pred _Rep.: guardare (<arg0>,<arg1>)

Derivation: <Nil>

Selectional Restr.: arg0 = Animate //concept// arg1:default = [Entity]

Formal: isa (<guardare_2>,<percepire>: [Psychological_event])

Agentive: <Nil>

Constitutive: instrument (<guardare_2>, <occhio>:[body_part])

intentionality ={yes}

Telic: <Nil>

Collocates: Collocates (<SemU1>,...<SemUn>)

Complex: <Nil>

Page 13: Linguistic Specifications Penn, December 11 2000

Semantic Relations in SIMPLESemantic Relations in SIMPLE

To represent: multiple meaning dimensions in a sense- Qualia Qualia Rel.

cross-PoS relations (nominalization etc)- DerivationDerivation Rel.

regular polysemous classes - PolysemyPolysemy Rel.

collocation information - CollocationCollocation Rel.

Requirements of Flexibility & OpennessRequirements of Flexibility & Opennessan extendable framework:extendable framework: to allow coherent future extensions with additional or more specific infomultipurpose requirements: multipurpose requirements: to make it possible tuning for specific applications/text types

Page 14: Linguistic Specifications Penn, December 11 2000

Modular Representation of a Semantic Unity

Semantic Relations in Semantic Relations in SIMPLESIMPLE

SemUSemUPredicate, arguments, Predicate, arguments, Selectional restrictionsSelectional restrictions

Pred. Layer

Relations between Relations between SemUsSemUs

Rel. Layer

QualiaQualia DerivationDerivation PolysemyPolysemy CollocationCollocation

Page 15: Linguistic Specifications Penn, December 11 2000

TopTop

FormalFormal ConstitutiveConstitutive AgentiveAgentive TelicTelic

Is_aIs_a Is_a_part_ofIs_a_part_of PropertyProperty

ContainsContains

Created_byCreated_by Agentive_causeAgentive_cause Indirect_telicIndirect_telic ActivityActivity

InstrumentalInstrumental Is_the_habit_ofIs_the_habit_of

Used_forUsed_for Used_asUsed_as

... ...

The targets of relations identify:

prototypical semantic information associated with a SemUprototypical semantic information associated with a SemU

elements of dictionary definitions of SemUselements of dictionary definitions of SemUs

typical corpus collocates of the SemUtypical corpus collocates of the SemU

Page 16: Linguistic Specifications Penn, December 11 2000

Calcina (mortar)

SemU: 3070

Type: [Artifactual_material][Artifactual_material]

White substance used as material to White substance used as material to build wallsbuild walls

<costruire>build <sostanza>

substance<materiale>

material

Isa Used_asUsed_for

Page 17: Linguistic Specifications Penn, December 11 2000

Ala (wing)

SemU: 3232Type: [Part][Part]Part of an airplanePart of an airplane

<uccello>bird

<parte>part

<volare>fly

IsaSemU: 3268Type: [Part][Part]Part of a buildingPart of a building

SemU: D358Type: [Body_part][Body_part]Organ of birds for flyingOrgan of birds for flying

Used_for

Isa

Isa

<fabbricare>make

Used_for

Agentive

<edificio>building

<aeroplano>building

Is_a_part_of

Is_a_part_ofIs_a_part_of

SemU: 3467Type: [Role][Role]Role in footballRole in football

<giocatore>player

Isa

Page 18: Linguistic Specifications Penn, December 11 2000

SemU

Sell V

SemU

Sale N

SemU

Seller N

Pred_SELL <ARG0>, <ARG1>,

<ARG2>, <ARG3>

Event_nounEvent_noun

Relations and Predicates in SIMPLERelations and Predicates in SIMPLE

Is_the_agent_ofIs_the_agent_of

Page 19: Linguistic Specifications Penn, December 11 2000

Comprendere V

SemU: 61725

Type: [Cognitive_event][Cognitive_event]

To understandTo understand

SemU: 6962

Type: [Constitutive_state][Constitutive_state]

To includeTo include

Comprensione N

SemU: 61726

Type: [Cognitive_event][Cognitive_event]

UnderstandingUnderstanding

Comprendere#1 Comprendere#1 <Arg1 [+human]>, <Arg2 [ +semiotic]><Arg1 [+human]>, <Arg2 [ +semiotic]>

Comprendere#2Comprendere#2<Arg1 [+group]>, <Arg2><Arg1 [+group]>, <Arg2>

master

master

verb_nominalization

Page 20: Linguistic Specifications Penn, December 11 2000

il difensoredifensore di Clintonil difensoredifensore della Juventus

Difensore N

SemU: 4125

Type: [Role][Role]

DefenderDefender

SemU: 3526

Type: [Role][Role]

FullbackFullback

Difendere#1Difendere#1<Arg1>, <Arg2><Arg1>, <Arg2>

agent_nominalization

<squadra>team

Is_a_member_of

Page 21: Linguistic Specifications Penn, December 11 2000

Multidimensional OntologyMultidimensional Ontology

1. TELIC [Top]

2. AGENTIVE [Top]

2.1. Cause [Agentive]

3. CONSTITUTIVE [Top]

3.1. Part [Constitutive]

3.1.1. Body_part [Part]

3.2. Group [Constitutive]

3.2.1. Human_group [Group]

3.3. Amount [Constitutive]

4. ENTITY [Top]

4.1. Concrete_entity [Entity]

4.1.1. Location [Concrete_entity]

Usem: 1

BC number: number

Template_Type: [Part]

Template_Supertype:

[Constitutive]

Domain: General

Semantic Class: Part + <Semantic Class>

Gloss //free//

Pred_Rep.: Part_of (<arg0>)

Selectional Restr.:

arg0 = [Entity]

Derivation: <Derivational Relation>

Formal: isa (1, <part> or <hyperonym>)

Agentive: <Nil>

Constitutive: is_a_part_of (1, <Usem>: [Constitutive])

Telic: <Nil>

Synonymy: <Nil>

Collocates:Collocates (<Usem1>,...,<Usemn>)

Complex: <Nil>

Page 22: Linguistic Specifications Penn, December 11 2000

SIMPLE wrt EAGLES/ISLESIMPLE wrt EAGLES/ISLEComputational Lexicon WGComputational Lexicon WG

Multilingual Lexicons Multilingual Lexicons (US-EU coop.) (US-EU coop.)

Last EAGLESLast EAGLES work on Lexicon/Semantics used used for SIMPLE specifications

SIMPLESIMPLE lexicons chosen as a basis for applying & testingbasis for applying & testing EAGLES/ISLEEAGLES/ISLE work on defining common guidelines for Multilingual LexiconsMultilingual Lexicons

Page 23: Linguistic Specifications Penn, December 11 2000

Basic lexical semantic Basic lexical semantic notionsnotions

BASE CONCEPTSBASE CONCEPTS, , HYPONYMYHYPONYMY, , SYNONYMYSYNONYMY: all applications and enabling technologies

SEMANTIC FRAMESSEMANTIC FRAMES: MT, IR, IE, & Gen, Pars, MWR, WSD, Coref

COOCCURRENCE RELATIONSCOOCCURRENCE RELATIONS:: MT, Gen, Word Clust, WSD, Par

MERONYMYMERONYMY: MT, IR, IE & Gen, PNR ANTONYMYANTONYMY: Gen, Word Clust, WSD SUBJECT DOMAINSUBJECT DOMAIN: MT, SUM, Gen, MWR, WSD ACTIONALITYACTIONALITY: MT, IE, Gen, Par QUANTIFICATIONQUANTIFICATION: MT, Gen, Coref

Page 24: Linguistic Specifications Penn, December 11 2000

Complementarity wrt EuroWordNetComplementarity wrt EuroWordNet

Use of a small EWN subset for all languages Mappable Top Ontology Actual linking of data for a few languages

Semantic subcategorisation and linking with syntax Template structure for the description of SemU SemU vs. Synset: basic unit Nodes in the Ontology as structured Sem. Types (bundles of

different info types)

Page 25: Linguistic Specifications Penn, December 11 2000

From SENSEVAL/ROMANSEVALFrom SENSEVAL/ROMANSEVAL

Which requirements?Which requirements?

Common semantic tagset, Common semantic tagset, Gold StandardGold Standard Criteria for sense discriminationCriteria for sense discrimination (flexible & adaptable) & (flexible & adaptable) & sense-sense-

granularitygranularity Different dimensionsDifferent dimensions of meanings of meanings Different Different disambiguation cluesdisambiguation clues/strategies (interaction syntax & /strategies (interaction syntax &

semantics)semantics) Underspecified readingsUnderspecified readings (regular polysemy) (regular polysemy) MultiWordsMultiWords Metaphorical usageMetaphorical usage

Page 26: Linguistic Specifications Penn, December 11 2000

Core Lexicons to be enlarged Core Lexicons to be enlarged at the National levelat the National level

PAROLE/SIMPLE start providing the common platform

For the subsidiarity concept the process started at the EU level is continued at the national level:

PAROLE/SIMPLE resources are being enlarged within National Projects (e.g. Danish, Greek, Italian, Portuguese, ...)

This creates a really large infrastructure of harmonised LR throughout a really large infrastructure of harmonised LR throughout EuropeEurope, impossible without the fundamental role played by the EC Standards and LRs projects

A major achievement in Europe, where all the difficulties of LRs building are multiplied by the language factor