![Page 1: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/1.jpg)
An Approach to Catalan Adjective Classes by Clustering
Laura Alonso AlemanyUniversitat de Barcelona
Gemma Boleda TorrentUniversitat Pompeu Fabra
![Page 2: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/2.jpg)
motivation
• to search for empirical (corpus-based) support for theories of adjective classification via data-driven methods
• to enhance a lexicon with information on adjective classes in an inexpensive and reliable way
![Page 3: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/3.jpg)
contents
• introduction
• previous theoretical work
• a preliminary hypothesis
• experiments on clustering adjectives
• results and discussion
![Page 4: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/4.jpg)
introduction
hypothesis 0: a single class of adjectivesBUT: heterogeneous behaviour of adjectives:
La noia és (molt) alta [the girl is (very) tall]
*La bandera és nacional [*the flag is national]
*L’assassí és presumpte[*the murderer is alleged]
this hypothesis is problematic and it doesn’t describe the data properly
![Page 5: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/5.jpg)
why clustering
• clustering has been used for inferring knowledge in not-so-well-known domains:– verbal subcategorization and selectional
restrictions (Schulte im Walde & Brew 2002)
– inference of POS tags for unknown languages
• it introduces little bias into the final results– there are no pre-defined classes (as opposed to
classification methods; see Bohnet et al. 2002)– ... but bias in modelling the data
![Page 6: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/6.jpg)
problems with clustering
• it is a data-driven technique, but the appropriate degree of abstraction must be chosen
• completely data-driven approaches are possible, but– the search space becomes far too big– they are very sensitive to data sparseness
![Page 7: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/7.jpg)
contents
• introduction
• previous theoretical work
• a preliminary hypothesis
• experiments on clustering adjectives
• results and discussion
![Page 8: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/8.jpg)
two traditions
• two main scholarly traditions regarding the study of adjectives:
– descriptive grammar• morphology (derivational processes) and syntax
(ordering among adjectives and with respect to head)
• denotational semantics
– formal semantics• semantic type (modifier or predicate)
![Page 9: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/9.jpg)
classifications
example formal semantics(Montague 1974)
descriptive grammar(GDLE, GCC)
red <e,t> qualitative
political ? relational
alleged <<e,t>, <e,t>> adverbial (+others)
![Page 10: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/10.jpg)
qualitative / <e,t>
• predicative (syntactic version; Levi 1978)
– red house / this house is red• national flag / *this flag is national
• alleged murderer / *this murderer is alleged
• gradable / comparable– very red / redder, reddish
• scalar (Raskin & Nirenburg 1995)
– red/green/blue, big/small
• in Catalan, typically following the head noun
![Page 11: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/11.jpg)
“adverbial” / <<e,t>, <e,t>>
• nonpredicative– alleged murderer / *this murderer is alleged
• nongradable, noncomparable– *very/more alleged murderer
• nonscalar– and no antonym
• in Catalan, only preceding the head noun
these parameters seem to be
relevant, we´ll use them in
experiments
![Page 12: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/12.jpg)
on adjective position• the position of the adjective in Catalan and in other
Romance languages is related to reference restriction (GCC, GDLE)
– prenominal nonrestricting– postnominal restricting
• very few strict nonpredicative adjectives• usual case: mixed behaviour, with shift in meaning
(potential problem!)– antic president ‘former president’
nonpredicative reading– armari antic ‘antique wardrobe’
qualitative reading
![Page 13: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/13.jpg)
a gap: relational
morphology denominal or deverbal
syntax • only occur in NP• adjacent to noun, nearer than
qualitative (relative ordering)• no coordination with
qualitative adjectives
semantics relate noun to external entity (not primitive property)
• a.o.: Bally 1944, GDLE, GCC, Engel 1988, Levi 1978
*la màquina és agrícolaTHE MACHINE IS AGRICULTURAL*una màquina gran agrícola
vs. una màquina agrícola gran A MACHINE AGRICULTURAL BIG*una màquina agrícola i gran
A MACHINE AGRICULTURAL AND BIG
![Page 14: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/14.jpg)
a gap: relational
• predicativity: mixed behavior– El congrés és internacional <e,t> THE CONFERENCE IS INTERNATIONAL
– La Joana és corresponsal internacional [THE] JOANA IS INTERNATIONAL CORRESPONDENT
– *La Joana és internacional / <e,t>– *La corresponsal és internacional / <e,t>
• ambiguity / class shift or property?
![Page 15: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/15.jpg)
a gap: relational
• gradability and comparativity:said to be nongradable and noncomparable but very easy
“qualitativization”
– *un tractor molt agrícola A TRACTOR VERY AGRICULTURAL
– una noia molt internacional A GIRL VERY INTERNATIONAL
(“has travelled a lot”, “knows many people from abroad”)
• could reflect diachronic processes
these facts could explain results –at
least in part-
![Page 16: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/16.jpg)
contents
• introduction
• previous theoretical work
• a preliminary hypothesis
• experiments on clustering adjectives
• results and discussion
![Page 17: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/17.jpg)
hypothesis: three classes of adjectives– qualitative– non predicative– relational
vermell ‘red’, alt ‘tall’
presumpte ‘alleged’
agrícola ‘agricultural’
adjective classes
![Page 18: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/18.jpg)
challenges
• does this classification have empirical (corpus-based) support?
• can adjectives be automatically classified using the features reviewed?
• which are the most relevant features for adjective classification?
![Page 19: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/19.jpg)
contents
• introduction
• previous theoretical work
• a preliminar hypothesis
• experiments on clustering adjectives
• results and discussion
![Page 20: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/20.jpg)
modelling adjectives
• find a textual correlate of theoretical parameters that describe semantic classes– in terms of morphosyntactic data– retrievable from an annotated corpus
• it is not always possible– and careful with redundant features!
• values are difficult to set adequately
![Page 21: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/21.jpg)
• distributional properties
the set of attributes• follows a verb
• cooccurs with molt ‘very’ and the like
• form inflected by size morphemes
• cooccurs with més/menys ‘more/less’
• form inflected by superlative morpheme ‘íssim’
• precedes or follows a noun
• precedes or follows an adjective
• predicativity
• gradability, scalability
• comparativity, scalability
• reference restriction
• POS of surrounding words (five word window)
• ref. restr. / relative ordering
![Page 22: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/22.jpg)
corpus: fragment of CTILC• collected by the Institute for Catalan Studies (IEC)• 8.5 million words• Catalan texts from 1970 onwards
– only written, quite formal register
• manually revised tagging (but there are errors!!)
– lemma, part-of-speech, morphological info (EAGLES standard)
– no syntactic information• 571365 adjective occurrences (tokens)• 17325 adjective lemmata (types)
![Page 23: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/23.jpg)
data and tools
1. each adjective is described as a vector• where each dimension is one of the features
relevant for characterising the adjective• the values of the features are a real value
between 0 and 1
2. a matrix is built with all the vectors
3. perform the clustering with CLUTO (Karypis 2002)
![Page 24: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/24.jpg)
experiment setting• set of objects: only frequent adjectives (4859
objects, +10 occurrences)• set of attributes
1. only textual correlates of semantic properties2. only context of occurrence3. combination of 1, 2 / with customized values
• attribute values: true percentages• number of clusters: 2, 3, 4, 5, 6, 7• clustering parameters
– combination of E/I criteria, partitional algorithm
![Page 25: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/25.jpg)
gold standard
• annotated by human judges– 76 adjectives chosen randomly from the corpus
– classified by human judges into 4+1 classes• qualitative: calent ‘hot’, actiu ‘active/lively’• relational: científic, digital• qualitative/non-predicative: antic• non-predicative: presumpte ‘alleged’, mer ‘mere’
• errors: artistacostly process, only a small number of
adjectives can be considered
nonpredicative very few, not
represented added manually
![Page 26: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/26.jpg)
contents
• introduction
• previous theoretical work
• a preliminary hypothesis
• experiments on clustering adjectives
• results and discussion
![Page 27: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/27.jpg)
semantic parametersvs. gold standard
0 1 2 3 4
R
Q
NQ
N
467 3040 229 593 787
![Page 28: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/28.jpg)
semantic parametersvs. gold standard
0 1 2 3 4
R
Q
NQ
N
predicativity 0.1, comparativity 0, after Adj 0.03
possible, necessari
after noun 0.54, comparativity 0
alemany ´german´, internacional
gradability 0.07, comparativity 0
millor ´best´, eixerit ´nice, lively´
preceding common noun 0.06, after common noun 0.49
presumpte ´alleged´, antic ´former/old/antique´
after common noun 0.49, comparativity 0
important, subversiu ´subversive´
467 3040 229 593 787
![Page 29: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/29.jpg)
contextual vs. semantic attributes
0 1 2 3 4
R
Q
NQ
N
0 1 2 3 4
R
Q
NQ
N
contextual
semantic
467 3040 229 336 787
2107 697 290 593 1172
![Page 30: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/30.jpg)
contextual vs. semantic attributes
0 1 2 3 4
R
Q
NQ
N
0 1 2 3 4
R
Q
NQ
N
contextual
semantic
+1Prep 0.3, +2determiner 0.25
important, necessari, diagonal
+1 punctuation 0,
-1 Noun 0.5
preescolar, subversiu
-1 common noun 0.5, -2 determiner 0.34general, negre ´black´,
alemany ´german´, internacional 2107 697 290 593 1172
467 3040 229 336 787
+1 punctuation, -1 adv
possible, hot
+1 noun, -1 determiner
mer ´mere´, antic
![Page 31: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/31.jpg)
agreement between solutions
0 context 1 context 2context
3 context 4 context
4 semantic
3 semantic
2 semantic
1 semantic
0 semantic
![Page 32: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/32.jpg)
homogeneity of adjective classesvs. gold standard
semantic parameters
semantic parameters and context
N NQ Q R
contextual attributes
N NQ Q R
N NQ Q R
N NQ Q R
customized values
![Page 33: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/33.jpg)
questions
• which is the best clustering solution?
• which attributes are actually descriptive of adjective behaviour?
• which are noisy?
• which classes receive empirical support?
![Page 34: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/34.jpg)
discussion
• contextual and semantic features yield quite similar results, although– semantic features seem to be more adequate– contextual are stronger!
• the most discriminating attribute is position of the adjective with respect to the noun– why are some others not discriminating? (modelling)
• noisy: – preposition follows– punctuation follows
![Page 35: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/35.jpg)
discussion
• clustering is a useful technique for inductive investigation on adjective classes– which hadn’t been done before
• theoretically biased results are supported by distributional properties
![Page 36: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/36.jpg)
discussion• the following classes of adjectives emerge
from the results:– nonpredicative (with few elements)– relational
• consistent behaviour• similar to a part of the qualitative
– could reflect a diachronic process or class shift
– or a bad modelling of the adjectives
![Page 37: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/37.jpg)
discussion
• qualitative adjectives as described in the literature are not homogeneous:– predicativity, gradability and comparativity are
not distributed uniformly in these adjectives– distributional properties are not uniform either
unexpected?
![Page 38: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/38.jpg)
future work
• further linguistic investigation of results
• other clustering solutions
• evaluation
![Page 39: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/39.jpg)
referencesBally, C. (1944) Linguistique générale et linguistique françaiseB. Bohnet, S. Klatt and L. Wanner (2002) An Approach to Automatic Annotation
of Functional Information to Adjectives with an Application to GermanGDLE: Bosque, I. and V. Demonte, eds. (1999) Gramática Descriptiva de la
Lengua EspañolaEngel, U. (1988) Deutsche Grammatik, Heidelberg: Julius Groos VerlagLevi, J. N. (1978) The Syntax and Semantics of Complex NominalsMontague, R. (1974) Formal Filosophy. Selected Papers of Richard MontagueRaskin, V. and S. Nirenburg (1995) Lexical Semantics of Adjectives. A
Microtheory of Adjectival MeaningSchulte im Walde, S. and C. Brew (2002) Inducing German Semantic Verb
Classes from Purely Syntactic Subcategorisation InformationGCC: Solà, J. et al., eds. (2002) Gramàtica del Català Contemporani
![Page 40: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/40.jpg)
a vector
verd 181 serestarsemblarpredicatiu 0.0386740331491713 comparativitat 0 gradabilitat 0.0165745856353591 modificador_dreta 0.0220994475138122 modificador_esquerra 0.895027624309392 menys2_Adj 0.0497237569060773 menys2_Adv 0.00828729281767956 menys2_Conj 0.00552486187845304 menys2_Det 0.281767955801105 menys2_Esp 0.0441988950276243 menys2_Nom 0.0911602209944751 menys2_Num 0 menys2_PT 0.0607734806629834 menys2_Prep 0.187845303867403 menys2_Pron 0.0110497237569061
menys2_Verb 0.25414364640884 menys2_no 0.00552486187845304 menys1_Adj 0.00276243093922652 menys1_Adv 0.0110497237569061 menys1_Conj 0.0276243093922652 menys1_Det 0.0276243093922652 menys1_Esp 0 menys1_Nom 0.81767955801105 menys1_Num 0 menys1_PT 0.0110497237569061 menys1_Prep 0.00552486187845304 menys1_Verb 0.0524861878453039 menys1_no 0.0331491712707182 mes1_Adj 0.00828729281767956 mes1_Adv 0.0441988950276243 mes1_Conj 0.0718232044198895 mes1_Det 0.0386740331491713
mes1_Esp 0.0331491712707182 mes1_Nom 0.0267034990791897 mes1_PT 0.320441988950276 mes1_Prep 0.366482504604052 mes1_Pron 0.0220994475138122 mes1_Verb 0.0460405156537753 mes1_no 0.0220994475138122 mes2_Adj 0.0607734806629834 mes2_Adv 0.00552486187845304 mes2_Conj 0.0552486187845304 mes2_Det 0.276243093922652 mes2_Esp 0.069060773480663 mes2_Nom 0.160220994475138 mes2_PT 0.0718232044198895 mes2_Prep 0.124309392265193 mes2_Pron 0.0303867403314917 mes2_Verb 0.140883977900552
predicativitycomparativity
gradabilityright modifierleft modifier
back
![Page 41: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/41.jpg)
the matrix0 0 0 0 0 0.0833333333333333 0 0 0.75 0 0 0.166666666666667 0 0 0 0 0 0 0 0 0 10.2 0 0.0666666666666667 0.0666666666666667 0 0 0.0666666666666667 0 0.4666666660.384615384615385 0 0.153846153846154 0 0 0 0 0.0769230769230769 0.23076923076920.11 0 0.075 0.015 0.005 0.12 0.04 0.04 0.33 0.13 0.01 0.15 0.01 0.06 0 0.01 0.10 0 0 0 0 0 0.0133333333333333 0 0.88 0 0 0.0933333333333333 0 0.0133333333333330.192307692307692 0.0384615384615385 0.0384615384615385 0 0 0.0769230769230769 00.117647058823529 0 0.0784313725490196 0 0 0.196078431372549 0.01960784313725490 0 0 0.0789473684210526 0.0263157894736842 0.105263157894737 0 0 0.3684210526310 0 0 0 1 0.0588235294117647 0 0.294117647058824 0 0 0 0 0 0.588235294117647 0.00.0952380952380952 0.0476190476190476 0.0476190476190476 0.0476190476190476 0.040 0 0.0681818181818182 0 0 0.204545454545455 0.0681818181818182 0 0.2727272727270.0769230769230769 0 0 0 0 0.230769230769231 0 0 0.461538461538462 0 0 0.23076920.04 0 0.08 0 0 0.28 0 0 0.4 0.04 0 0.04 0 0.08 0 0 0.16 0 0.12 0.2 0.2 0.36 0 00.293333333333333 0 0.04 0.0133333333333333 0 0.08 0.0133333333333333 0.026666660.133333333333333 0 0 0 0 0.133333333333333 0 0.0666666666666667 0.46666666666660 0 0 0.0909090909090909 0.0909090909090909 0.181818181818182 0 0.090909090909090.0434782608695652 0 0.130434782608696 0.0434782608695652 0 0.130434782608696 00.104166666666667 0 0.0625 0 0.0208333333333333 0.0208333333333333 0.0625 0.10410.0526315789473684 0 0.105263157894737 0.0526315789473684 0 0 0.105263157894737
back
![Page 42: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/42.jpg)
CLUTO(v. 1.5.1, Karypis 2002)
• high dimensional datasets• analysis of cluster features• partitional or agglomerative algorithms• various criterion functions, taking into
account similarity within the objects in a cluster (internal criterion) and/or the differences between objects of different clusters (external criterion)
back
partitional
combination of
internal and external
criteria
![Page 43: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/43.jpg)
human gold standardinter-judge agreement
E (judge 1) NQ (judge 1) Q (judge 1) R (judge 1)
R (judge 2)
Q (judge 2)
NQ (judge 2)
E (judge 2)
back
![Page 44: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/44.jpg)
contextual attributes vs. gold standard
0 1 2 3 4
R
Q
NQ
N
agreement with semantic attributes
![Page 45: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/45.jpg)
contextual attributes vs. gold standard
0 1 2 3 4
R
Q
NQ
N
human
preceding common noun (7%),
following specifier (7%)
preceding punctuation
(40%), following adverb or
verb
not preceding punctuation,
following common noun
(50%)
agreement with semantic attributes
following common noun
(50%), following specifier (34%)
preceding preposition
(30%), preceding specifier (25%)
![Page 46: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/46.jpg)
customized valuesgradability and comparativity normalized to binary
0 1 2 3 4
R
Q
NQ
N
gradability (65%), comparativity
(12%)
after common noun (11%), comparativity
(13%), gradability (61%)
comparativity (14%), gradability (61%)
back
gradability (61%), after common noun (10%), followed by common noun (2%)
comparativity (12%), followed by common
noun (2%), after common noun (11%)
![Page 47: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/47.jpg)
customized valuesgradability and comparativity normalized to binary
N NQ Q R
4
3
2
1
0
back
![Page 48: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/48.jpg)
interpretation of resultsquality of cluster solution
• tightness of obtained clusters– objects within a cluster are very similar to each
other– objects are very dissimilar to objects in
different clusters
• attribute distribution: different values across clusters evidence discriminating function of attributes
![Page 49: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/49.jpg)
tightness of clustering solutions
00,10,20,30,40,50,60,70,80,9
Isim ISdev Esim Esdev
back
![Page 50: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/50.jpg)
attribute distribution across clusters
com
pa
rativ
ity
be
fore
ad
ject
ive
afte
r
ad
ject
ive
gra
da
bili
ty
afte
r n
ou
n
be
fore
no
un
pre
dic
ativ
ity
![Page 51: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/51.jpg)
attribute distribution across clusters
com
pa
rativ
ity
be
fore
ad
ject
ive
afte
ra
dje
ctiv
e
gra
da
bili
ty
afte
r n
ou
n
be
fore
no
un
pre
dic
ativ
ity
back
![Page 52: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/52.jpg)
attribute distribution across clusters
afte
r n
oun
pre
po.
aft
er
-2 s
pec
.
nou
n a
fter
pre
dic
ativ
ity
back
back to interpretation
![Page 53: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/53.jpg)
decision list
• a gold standard annotated by human judges
• a gold standard built with a decision list– deductive classification: using some of the
attributes in the vectors for classifying adjectives into pre-defined classes:• predicativity• position with respect to the head noun• gradability and comparativity
fully automatic: inexpensive but unsupervised
![Page 54: An Approach to Catalan Adjective Classes by Clustering Laura Alonso Alemany Universitat de Barcelona lalonso@fil.ub.es Gemma Boleda Torrent Universitat](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d2b5503460f949ffcc6/html5/thumbnails/54.jpg)
decision list vs.human gold standard
N(decisionlist)
NQ(decisionlist)
Q(decisionlist)
R(decisionlist)
R (golden)
Q (golden)
NQ (golden)
N (golden)
a deductive approach does not
provide a good solution