an approach to catalan adjective classes by clustering laura alonso alemany universitat de barcelona...

54
An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona [email protected] Gemma Boleda Torrent Universitat Pompeu Fabra [email protected]

Post on 19-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

An Approach to Catalan Adjective Classes by Clustering

Laura Alonso AlemanyUniversitat de Barcelona

[email protected]

Gemma Boleda TorrentUniversitat Pompeu Fabra

[email protected]

Page 2: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

motivation

• to search for empirical (corpus-based) support for theories of adjective classification via data-driven methods

• to enhance a lexicon with information on adjective classes in an inexpensive and reliable way

Page 3: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

contents

• introduction

• previous theoretical work

• a preliminary hypothesis

• experiments on clustering adjectives

• results and discussion

Page 4: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

introduction

hypothesis 0: a single class of adjectivesBUT: heterogeneous behaviour of adjectives:

La noia és (molt) alta [the girl is (very) tall]

*La bandera és nacional [*the flag is national]

*L’assassí és presumpte[*the murderer is alleged]

this hypothesis is problematic and it doesn’t describe the data properly

Page 5: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

why clustering

• clustering has been used for inferring knowledge in not-so-well-known domains:– verbal subcategorization and selectional

restrictions (Schulte im Walde & Brew 2002)

– inference of POS tags for unknown languages

• it introduces little bias into the final results– there are no pre-defined classes (as opposed to

classification methods; see Bohnet et al. 2002)– ... but bias in modelling the data

Page 6: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

problems with clustering

• it is a data-driven technique, but the appropriate degree of abstraction must be chosen

• completely data-driven approaches are possible, but– the search space becomes far too big– they are very sensitive to data sparseness

Page 7: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

contents

• introduction

• previous theoretical work

• a preliminary hypothesis

• experiments on clustering adjectives

• results and discussion

Page 8: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

two traditions

• two main scholarly traditions regarding the study of adjectives:

– descriptive grammar• morphology (derivational processes) and syntax

(ordering among adjectives and with respect to head)

• denotational semantics

– formal semantics• semantic type (modifier or predicate)

Page 9: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

classifications

example formal semantics(Montague 1974)

descriptive grammar(GDLE, GCC)

red <e,t> qualitative

political ? relational

alleged <<e,t>, <e,t>> adverbial (+others)

Page 10: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

qualitative / <e,t>

• predicative (syntactic version; Levi 1978)

– red house / this house is red• national flag / *this flag is national

• alleged murderer / *this murderer is alleged

• gradable / comparable– very red / redder, reddish

• scalar (Raskin & Nirenburg 1995)

– red/green/blue, big/small

• in Catalan, typically following the head noun

Page 11: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

“adverbial” / <<e,t>, <e,t>>

• nonpredicative– alleged murderer / *this murderer is alleged

• nongradable, noncomparable– *very/more alleged murderer

• nonscalar– and no antonym

• in Catalan, only preceding the head noun

these parameters seem to be

relevant, we´ll use them in

experiments

Page 12: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

on adjective position• the position of the adjective in Catalan and in other

Romance languages is related to reference restriction (GCC, GDLE)

– prenominal nonrestricting– postnominal restricting

• very few strict nonpredicative adjectives• usual case: mixed behaviour, with shift in meaning

(potential problem!)– antic president ‘former president’

nonpredicative reading– armari antic ‘antique wardrobe’

qualitative reading

Page 13: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

a gap: relational

morphology denominal or deverbal

syntax • only occur in NP• adjacent to noun, nearer than

qualitative (relative ordering)• no coordination with

qualitative adjectives

semantics relate noun to external entity (not primitive property)

• a.o.: Bally 1944, GDLE, GCC, Engel 1988, Levi 1978

*la màquina és agrícolaTHE MACHINE IS AGRICULTURAL*una màquina gran agrícola

vs. una màquina agrícola gran A MACHINE AGRICULTURAL BIG*una màquina agrícola i gran

A MACHINE AGRICULTURAL AND BIG

Page 14: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

a gap: relational

• predicativity: mixed behavior– El congrés és internacional <e,t> THE CONFERENCE IS INTERNATIONAL

– La Joana és corresponsal internacional [THE] JOANA IS INTERNATIONAL CORRESPONDENT

– *La Joana és internacional / <e,t>– *La corresponsal és internacional / <e,t>

• ambiguity / class shift or property?

Page 15: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

a gap: relational

• gradability and comparativity:said to be nongradable and noncomparable but very easy

“qualitativization”

– *un tractor molt agrícola A TRACTOR VERY AGRICULTURAL

– una noia molt internacional A GIRL VERY INTERNATIONAL

(“has travelled a lot”, “knows many people from abroad”)

• could reflect diachronic processes

these facts could explain results –at

least in part-

Page 16: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

contents

• introduction

• previous theoretical work

• a preliminary hypothesis

• experiments on clustering adjectives

• results and discussion

Page 17: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

hypothesis: three classes of adjectives– qualitative– non predicative– relational

vermell ‘red’, alt ‘tall’

presumpte ‘alleged’

agrícola ‘agricultural’

adjective classes

Page 18: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

challenges

• does this classification have empirical (corpus-based) support?

• can adjectives be automatically classified using the features reviewed?

• which are the most relevant features for adjective classification?

Page 19: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

contents

• introduction

• previous theoretical work

• a preliminar hypothesis

• experiments on clustering adjectives

• results and discussion

Page 20: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

modelling adjectives

• find a textual correlate of theoretical parameters that describe semantic classes– in terms of morphosyntactic data– retrievable from an annotated corpus

• it is not always possible– and careful with redundant features!

• values are difficult to set adequately

Page 21: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

• distributional properties

the set of attributes• follows a verb

• cooccurs with molt ‘very’ and the like

• form inflected by size morphemes

• cooccurs with més/menys ‘more/less’

• form inflected by superlative morpheme ‘íssim’

• precedes or follows a noun

• precedes or follows an adjective

• predicativity

• gradability, scalability

• comparativity, scalability

• reference restriction

• POS of surrounding words (five word window)

• ref. restr. / relative ordering

Page 22: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

corpus: fragment of CTILC• collected by the Institute for Catalan Studies (IEC)• 8.5 million words• Catalan texts from 1970 onwards

– only written, quite formal register

• manually revised tagging (but there are errors!!)

– lemma, part-of-speech, morphological info (EAGLES standard)

– no syntactic information• 571365 adjective occurrences (tokens)• 17325 adjective lemmata (types)

Page 23: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

data and tools

1. each adjective is described as a vector• where each dimension is one of the features

relevant for characterising the adjective• the values of the features are a real value

between 0 and 1

2. a matrix is built with all the vectors

3. perform the clustering with CLUTO (Karypis 2002)

Page 24: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

experiment setting• set of objects: only frequent adjectives (4859

objects, +10 occurrences)• set of attributes

1. only textual correlates of semantic properties2. only context of occurrence3. combination of 1, 2 / with customized values

• attribute values: true percentages• number of clusters: 2, 3, 4, 5, 6, 7• clustering parameters

– combination of E/I criteria, partitional algorithm

Page 25: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

gold standard

• annotated by human judges– 76 adjectives chosen randomly from the corpus

– classified by human judges into 4+1 classes• qualitative: calent ‘hot’, actiu ‘active/lively’• relational: científic, digital• qualitative/non-predicative: antic• non-predicative: presumpte ‘alleged’, mer ‘mere’

• errors: artistacostly process, only a small number of

adjectives can be considered

nonpredicative very few, not

represented added manually

Page 26: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

contents

• introduction

• previous theoretical work

• a preliminary hypothesis

• experiments on clustering adjectives

• results and discussion

Page 27: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

semantic parametersvs. gold standard

0 1 2 3 4

R

Q

NQ

N

467 3040 229 593 787

Page 28: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

semantic parametersvs. gold standard

0 1 2 3 4

R

Q

NQ

N

predicativity 0.1, comparativity 0, after Adj 0.03

possible, necessari

after noun 0.54, comparativity 0

alemany ´german´, internacional

gradability 0.07, comparativity 0

millor ´best´, eixerit ´nice, lively´

preceding common noun 0.06, after common noun 0.49

presumpte ´alleged´, antic ´former/old/antique´

after common noun 0.49, comparativity 0

important, subversiu ´subversive´

467 3040 229 593 787

Page 29: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

contextual vs. semantic attributes

0 1 2 3 4

R

Q

NQ

N

0 1 2 3 4

R

Q

NQ

N

contextual

semantic

467 3040 229 336 787

2107 697 290 593 1172

Page 30: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

contextual vs. semantic attributes

0 1 2 3 4

R

Q

NQ

N

0 1 2 3 4

R

Q

NQ

N

contextual

semantic

+1Prep 0.3, +2determiner 0.25

important, necessari, diagonal

+1 punctuation 0,

-1 Noun 0.5

preescolar, subversiu

-1 common noun 0.5, -2 determiner 0.34general, negre ´black´,

alemany ´german´, internacional 2107 697 290 593 1172

467 3040 229 336 787

+1 punctuation, -1 adv

possible, hot

+1 noun, -1 determiner

mer ´mere´, antic

Page 31: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

agreement between solutions

0 context 1 context 2context

3 context 4 context

4 semantic

3 semantic

2 semantic

1 semantic

0 semantic

Page 32: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

homogeneity of adjective classesvs. gold standard

semantic parameters

semantic parameters and context

N NQ Q R

contextual attributes

N NQ Q R

N NQ Q R

N NQ Q R

customized values

Page 33: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

questions

• which is the best clustering solution?

• which attributes are actually descriptive of adjective behaviour?

• which are noisy?

• which classes receive empirical support?

Page 34: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

discussion

• contextual and semantic features yield quite similar results, although– semantic features seem to be more adequate– contextual are stronger!

• the most discriminating attribute is position of the adjective with respect to the noun– why are some others not discriminating? (modelling)

• noisy: – preposition follows– punctuation follows

Page 35: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

discussion

• clustering is a useful technique for inductive investigation on adjective classes– which hadn’t been done before

• theoretically biased results are supported by distributional properties

Page 36: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

discussion• the following classes of adjectives emerge

from the results:– nonpredicative (with few elements)– relational

• consistent behaviour• similar to a part of the qualitative

– could reflect a diachronic process or class shift

– or a bad modelling of the adjectives

Page 37: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

discussion

• qualitative adjectives as described in the literature are not homogeneous:– predicativity, gradability and comparativity are

not distributed uniformly in these adjectives– distributional properties are not uniform either

unexpected?

Page 38: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

future work

• further linguistic investigation of results

• other clustering solutions

• evaluation

Page 39: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

referencesBally, C. (1944) Linguistique générale et linguistique françaiseB. Bohnet, S. Klatt and L. Wanner (2002) An Approach to Automatic Annotation

of Functional Information to Adjectives with an Application to GermanGDLE: Bosque, I. and V. Demonte, eds. (1999) Gramática Descriptiva de la

Lengua EspañolaEngel, U. (1988) Deutsche Grammatik, Heidelberg: Julius Groos VerlagLevi, J. N. (1978) The Syntax and Semantics of Complex NominalsMontague, R. (1974) Formal Filosophy. Selected Papers of Richard MontagueRaskin, V. and S. Nirenburg (1995) Lexical Semantics of Adjectives. A

Microtheory of Adjectival MeaningSchulte im Walde, S. and C. Brew (2002) Inducing German Semantic Verb

Classes from Purely Syntactic Subcategorisation InformationGCC: Solà, J. et al., eds. (2002) Gramàtica del Català Contemporani

Page 40: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

a vector

verd 181 serestarsemblarpredicatiu 0.0386740331491713 comparativitat 0 gradabilitat 0.0165745856353591 modificador_dreta 0.0220994475138122 modificador_esquerra 0.895027624309392 menys2_Adj 0.0497237569060773 menys2_Adv 0.00828729281767956 menys2_Conj 0.00552486187845304 menys2_Det 0.281767955801105 menys2_Esp 0.0441988950276243 menys2_Nom 0.0911602209944751 menys2_Num 0 menys2_PT 0.0607734806629834 menys2_Prep 0.187845303867403 menys2_Pron 0.0110497237569061

menys2_Verb 0.25414364640884 menys2_no 0.00552486187845304 menys1_Adj 0.00276243093922652 menys1_Adv 0.0110497237569061 menys1_Conj 0.0276243093922652 menys1_Det 0.0276243093922652 menys1_Esp 0 menys1_Nom 0.81767955801105 menys1_Num 0 menys1_PT 0.0110497237569061 menys1_Prep 0.00552486187845304 menys1_Verb 0.0524861878453039 menys1_no 0.0331491712707182 mes1_Adj 0.00828729281767956 mes1_Adv 0.0441988950276243 mes1_Conj 0.0718232044198895 mes1_Det 0.0386740331491713

mes1_Esp 0.0331491712707182 mes1_Nom 0.0267034990791897 mes1_PT 0.320441988950276 mes1_Prep 0.366482504604052 mes1_Pron 0.0220994475138122 mes1_Verb 0.0460405156537753 mes1_no 0.0220994475138122 mes2_Adj 0.0607734806629834 mes2_Adv 0.00552486187845304 mes2_Conj 0.0552486187845304 mes2_Det 0.276243093922652 mes2_Esp 0.069060773480663 mes2_Nom 0.160220994475138 mes2_PT 0.0718232044198895 mes2_Prep 0.124309392265193 mes2_Pron 0.0303867403314917 mes2_Verb 0.140883977900552

predicativitycomparativity

gradabilityright modifierleft modifier

back

Page 41: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

the matrix0 0 0 0 0 0.0833333333333333 0 0 0.75 0 0 0.166666666666667 0 0 0 0 0 0 0 0 0 10.2 0 0.0666666666666667 0.0666666666666667 0 0 0.0666666666666667 0 0.4666666660.384615384615385 0 0.153846153846154 0 0 0 0 0.0769230769230769 0.23076923076920.11 0 0.075 0.015 0.005 0.12 0.04 0.04 0.33 0.13 0.01 0.15 0.01 0.06 0 0.01 0.10 0 0 0 0 0 0.0133333333333333 0 0.88 0 0 0.0933333333333333 0 0.0133333333333330.192307692307692 0.0384615384615385 0.0384615384615385 0 0 0.0769230769230769 00.117647058823529 0 0.0784313725490196 0 0 0.196078431372549 0.01960784313725490 0 0 0.0789473684210526 0.0263157894736842 0.105263157894737 0 0 0.3684210526310 0 0 0 1 0.0588235294117647 0 0.294117647058824 0 0 0 0 0 0.588235294117647 0.00.0952380952380952 0.0476190476190476 0.0476190476190476 0.0476190476190476 0.040 0 0.0681818181818182 0 0 0.204545454545455 0.0681818181818182 0 0.2727272727270.0769230769230769 0 0 0 0 0.230769230769231 0 0 0.461538461538462 0 0 0.23076920.04 0 0.08 0 0 0.28 0 0 0.4 0.04 0 0.04 0 0.08 0 0 0.16 0 0.12 0.2 0.2 0.36 0 00.293333333333333 0 0.04 0.0133333333333333 0 0.08 0.0133333333333333 0.026666660.133333333333333 0 0 0 0 0.133333333333333 0 0.0666666666666667 0.46666666666660 0 0 0.0909090909090909 0.0909090909090909 0.181818181818182 0 0.090909090909090.0434782608695652 0 0.130434782608696 0.0434782608695652 0 0.130434782608696 00.104166666666667 0 0.0625 0 0.0208333333333333 0.0208333333333333 0.0625 0.10410.0526315789473684 0 0.105263157894737 0.0526315789473684 0 0 0.105263157894737

back

Page 42: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

CLUTO(v. 1.5.1, Karypis 2002)

• high dimensional datasets• analysis of cluster features• partitional or agglomerative algorithms• various criterion functions, taking into

account similarity within the objects in a cluster (internal criterion) and/or the differences between objects of different clusters (external criterion)

back

partitional

combination of

internal and external

criteria

Page 43: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

human gold standardinter-judge agreement

E (judge 1) NQ (judge 1) Q (judge 1) R (judge 1)

R (judge 2)

Q (judge 2)

NQ (judge 2)

E (judge 2)

back

Page 44: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

contextual attributes vs. gold standard

0 1 2 3 4

R

Q

NQ

N

agreement with semantic attributes

Page 45: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

contextual attributes vs. gold standard

0 1 2 3 4

R

Q

NQ

N

human

preceding common noun (7%),

following specifier (7%)

preceding punctuation

(40%), following adverb or

verb

not preceding punctuation,

following common noun

(50%)

agreement with semantic attributes

following common noun

(50%), following specifier (34%)

preceding preposition

(30%), preceding specifier (25%)

Page 46: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

customized valuesgradability and comparativity normalized to binary

0 1 2 3 4

R

Q

NQ

N

gradability (65%), comparativity

(12%)

after common noun (11%), comparativity

(13%), gradability (61%)

comparativity (14%), gradability (61%)

back

gradability (61%), after common noun (10%), followed by common noun (2%)

comparativity (12%), followed by common

noun (2%), after common noun (11%)

Page 47: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

customized valuesgradability and comparativity normalized to binary

N NQ Q R

4

3

2

1

0

back

Page 48: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

interpretation of resultsquality of cluster solution

• tightness of obtained clusters– objects within a cluster are very similar to each

other– objects are very dissimilar to objects in

different clusters

• attribute distribution: different values across clusters evidence discriminating function of attributes

Page 49: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

tightness of clustering solutions

00,10,20,30,40,50,60,70,80,9

Isim ISdev Esim Esdev

back

Page 50: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

attribute distribution across clusters

com

pa

rativ

ity

be

fore

ad

ject

ive

afte

r

ad

ject

ive

gra

da

bili

ty

afte

r n

ou

n

be

fore

no

un

pre

dic

ativ

ity

Page 51: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

attribute distribution across clusters

com

pa

rativ

ity

be

fore

ad

ject

ive

afte

ra

dje

ctiv

e

gra

da

bili

ty

afte

r n

ou

n

be

fore

no

un

pre

dic

ativ

ity

back

Page 52: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

attribute distribution across clusters

afte

r n

oun

pre

po.

aft

er

-2 s

pec

.

nou

n a

fter

pre

dic

ativ

ity

back

back to interpretation

Page 53: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

decision list

• a gold standard annotated by human judges

• a gold standard built with a decision list– deductive classification: using some of the

attributes in the vectors for classifying adjectives into pre-defined classes:• predicativity• position with respect to the head noun• gradability and comparativity

fully automatic: inexpensive but unsupervised

Page 54: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat

decision list vs.human gold standard

N(decisionlist)

NQ(decisionlist)

Q(decisionlist)

R(decisionlist)

R (golden)

Q (golden)

NQ (golden)

N (golden)

a deductive approach does not

provide a good solution