an approach to catalan adjective classes by clustering laura alonso alemany universitat de barcelona...
Post on 19-Dec-2015
212 views
TRANSCRIPT
An Approach to Catalan Adjective Classes by Clustering
Laura Alonso AlemanyUniversitat de Barcelona
Gemma Boleda TorrentUniversitat Pompeu Fabra
motivation
• to search for empirical (corpus-based) support for theories of adjective classification via data-driven methods
• to enhance a lexicon with information on adjective classes in an inexpensive and reliable way
contents
• introduction
• previous theoretical work
• a preliminary hypothesis
• experiments on clustering adjectives
• results and discussion
introduction
hypothesis 0: a single class of adjectivesBUT: heterogeneous behaviour of adjectives:
La noia és (molt) alta [the girl is (very) tall]
*La bandera és nacional [*the flag is national]
*L’assassí és presumpte[*the murderer is alleged]
this hypothesis is problematic and it doesn’t describe the data properly
why clustering
• clustering has been used for inferring knowledge in not-so-well-known domains:– verbal subcategorization and selectional
restrictions (Schulte im Walde & Brew 2002)
– inference of POS tags for unknown languages
• it introduces little bias into the final results– there are no pre-defined classes (as opposed to
classification methods; see Bohnet et al. 2002)– ... but bias in modelling the data
problems with clustering
• it is a data-driven technique, but the appropriate degree of abstraction must be chosen
• completely data-driven approaches are possible, but– the search space becomes far too big– they are very sensitive to data sparseness
contents
• introduction
• previous theoretical work
• a preliminary hypothesis
• experiments on clustering adjectives
• results and discussion
two traditions
• two main scholarly traditions regarding the study of adjectives:
– descriptive grammar• morphology (derivational processes) and syntax
(ordering among adjectives and with respect to head)
• denotational semantics
– formal semantics• semantic type (modifier or predicate)
classifications
example formal semantics(Montague 1974)
descriptive grammar(GDLE, GCC)
red <e,t> qualitative
political ? relational
alleged <<e,t>, <e,t>> adverbial (+others)
qualitative / <e,t>
• predicative (syntactic version; Levi 1978)
– red house / this house is red• national flag / *this flag is national
• alleged murderer / *this murderer is alleged
• gradable / comparable– very red / redder, reddish
• scalar (Raskin & Nirenburg 1995)
– red/green/blue, big/small
• in Catalan, typically following the head noun
“adverbial” / <<e,t>, <e,t>>
• nonpredicative– alleged murderer / *this murderer is alleged
• nongradable, noncomparable– *very/more alleged murderer
• nonscalar– and no antonym
• in Catalan, only preceding the head noun
these parameters seem to be
relevant, we´ll use them in
experiments
on adjective position• the position of the adjective in Catalan and in other
Romance languages is related to reference restriction (GCC, GDLE)
– prenominal nonrestricting– postnominal restricting
• very few strict nonpredicative adjectives• usual case: mixed behaviour, with shift in meaning
(potential problem!)– antic president ‘former president’
nonpredicative reading– armari antic ‘antique wardrobe’
qualitative reading
a gap: relational
morphology denominal or deverbal
syntax • only occur in NP• adjacent to noun, nearer than
qualitative (relative ordering)• no coordination with
qualitative adjectives
semantics relate noun to external entity (not primitive property)
• a.o.: Bally 1944, GDLE, GCC, Engel 1988, Levi 1978
*la màquina és agrícolaTHE MACHINE IS AGRICULTURAL*una màquina gran agrícola
vs. una màquina agrícola gran A MACHINE AGRICULTURAL BIG*una màquina agrícola i gran
A MACHINE AGRICULTURAL AND BIG
a gap: relational
• predicativity: mixed behavior– El congrés és internacional <e,t> THE CONFERENCE IS INTERNATIONAL
– La Joana és corresponsal internacional [THE] JOANA IS INTERNATIONAL CORRESPONDENT
– *La Joana és internacional / <e,t>– *La corresponsal és internacional / <e,t>
• ambiguity / class shift or property?
a gap: relational
• gradability and comparativity:said to be nongradable and noncomparable but very easy
“qualitativization”
– *un tractor molt agrícola A TRACTOR VERY AGRICULTURAL
– una noia molt internacional A GIRL VERY INTERNATIONAL
(“has travelled a lot”, “knows many people from abroad”)
• could reflect diachronic processes
these facts could explain results –at
least in part-
contents
• introduction
• previous theoretical work
• a preliminary hypothesis
• experiments on clustering adjectives
• results and discussion
hypothesis: three classes of adjectives– qualitative– non predicative– relational
vermell ‘red’, alt ‘tall’
presumpte ‘alleged’
agrícola ‘agricultural’
adjective classes
challenges
• does this classification have empirical (corpus-based) support?
• can adjectives be automatically classified using the features reviewed?
• which are the most relevant features for adjective classification?
contents
• introduction
• previous theoretical work
• a preliminar hypothesis
• experiments on clustering adjectives
• results and discussion
modelling adjectives
• find a textual correlate of theoretical parameters that describe semantic classes– in terms of morphosyntactic data– retrievable from an annotated corpus
• it is not always possible– and careful with redundant features!
• values are difficult to set adequately
• distributional properties
the set of attributes• follows a verb
• cooccurs with molt ‘very’ and the like
• form inflected by size morphemes
• cooccurs with més/menys ‘more/less’
• form inflected by superlative morpheme ‘íssim’
• precedes or follows a noun
• precedes or follows an adjective
• predicativity
• gradability, scalability
• comparativity, scalability
• reference restriction
• POS of surrounding words (five word window)
• ref. restr. / relative ordering
corpus: fragment of CTILC• collected by the Institute for Catalan Studies (IEC)• 8.5 million words• Catalan texts from 1970 onwards
– only written, quite formal register
• manually revised tagging (but there are errors!!)
– lemma, part-of-speech, morphological info (EAGLES standard)
– no syntactic information• 571365 adjective occurrences (tokens)• 17325 adjective lemmata (types)
data and tools
1. each adjective is described as a vector• where each dimension is one of the features
relevant for characterising the adjective• the values of the features are a real value
between 0 and 1
2. a matrix is built with all the vectors
3. perform the clustering with CLUTO (Karypis 2002)
experiment setting• set of objects: only frequent adjectives (4859
objects, +10 occurrences)• set of attributes
1. only textual correlates of semantic properties2. only context of occurrence3. combination of 1, 2 / with customized values
• attribute values: true percentages• number of clusters: 2, 3, 4, 5, 6, 7• clustering parameters
– combination of E/I criteria, partitional algorithm
gold standard
• annotated by human judges– 76 adjectives chosen randomly from the corpus
– classified by human judges into 4+1 classes• qualitative: calent ‘hot’, actiu ‘active/lively’• relational: científic, digital• qualitative/non-predicative: antic• non-predicative: presumpte ‘alleged’, mer ‘mere’
• errors: artistacostly process, only a small number of
adjectives can be considered
nonpredicative very few, not
represented added manually
contents
• introduction
• previous theoretical work
• a preliminary hypothesis
• experiments on clustering adjectives
• results and discussion
semantic parametersvs. gold standard
0 1 2 3 4
R
Q
NQ
N
467 3040 229 593 787
semantic parametersvs. gold standard
0 1 2 3 4
R
Q
NQ
N
predicativity 0.1, comparativity 0, after Adj 0.03
possible, necessari
after noun 0.54, comparativity 0
alemany ´german´, internacional
gradability 0.07, comparativity 0
millor ´best´, eixerit ´nice, lively´
preceding common noun 0.06, after common noun 0.49
presumpte ´alleged´, antic ´former/old/antique´
after common noun 0.49, comparativity 0
important, subversiu ´subversive´
467 3040 229 593 787
contextual vs. semantic attributes
0 1 2 3 4
R
Q
NQ
N
0 1 2 3 4
R
Q
NQ
N
contextual
semantic
467 3040 229 336 787
2107 697 290 593 1172
contextual vs. semantic attributes
0 1 2 3 4
R
Q
NQ
N
0 1 2 3 4
R
Q
NQ
N
contextual
semantic
+1Prep 0.3, +2determiner 0.25
important, necessari, diagonal
+1 punctuation 0,
-1 Noun 0.5
preescolar, subversiu
-1 common noun 0.5, -2 determiner 0.34general, negre ´black´,
alemany ´german´, internacional 2107 697 290 593 1172
467 3040 229 336 787
+1 punctuation, -1 adv
possible, hot
+1 noun, -1 determiner
mer ´mere´, antic
agreement between solutions
0 context 1 context 2context
3 context 4 context
4 semantic
3 semantic
2 semantic
1 semantic
0 semantic
homogeneity of adjective classesvs. gold standard
semantic parameters
semantic parameters and context
N NQ Q R
contextual attributes
N NQ Q R
N NQ Q R
N NQ Q R
customized values
questions
• which is the best clustering solution?
• which attributes are actually descriptive of adjective behaviour?
• which are noisy?
• which classes receive empirical support?
discussion
• contextual and semantic features yield quite similar results, although– semantic features seem to be more adequate– contextual are stronger!
• the most discriminating attribute is position of the adjective with respect to the noun– why are some others not discriminating? (modelling)
• noisy: – preposition follows– punctuation follows
discussion
• clustering is a useful technique for inductive investigation on adjective classes– which hadn’t been done before
• theoretically biased results are supported by distributional properties
discussion• the following classes of adjectives emerge
from the results:– nonpredicative (with few elements)– relational
• consistent behaviour• similar to a part of the qualitative
– could reflect a diachronic process or class shift
– or a bad modelling of the adjectives
discussion
• qualitative adjectives as described in the literature are not homogeneous:– predicativity, gradability and comparativity are
not distributed uniformly in these adjectives– distributional properties are not uniform either
unexpected?
future work
• further linguistic investigation of results
• other clustering solutions
• evaluation
referencesBally, C. (1944) Linguistique générale et linguistique françaiseB. Bohnet, S. Klatt and L. Wanner (2002) An Approach to Automatic Annotation
of Functional Information to Adjectives with an Application to GermanGDLE: Bosque, I. and V. Demonte, eds. (1999) Gramática Descriptiva de la
Lengua EspañolaEngel, U. (1988) Deutsche Grammatik, Heidelberg: Julius Groos VerlagLevi, J. N. (1978) The Syntax and Semantics of Complex NominalsMontague, R. (1974) Formal Filosophy. Selected Papers of Richard MontagueRaskin, V. and S. Nirenburg (1995) Lexical Semantics of Adjectives. A
Microtheory of Adjectival MeaningSchulte im Walde, S. and C. Brew (2002) Inducing German Semantic Verb
Classes from Purely Syntactic Subcategorisation InformationGCC: Solà, J. et al., eds. (2002) Gramàtica del Català Contemporani
a vector
verd 181 serestarsemblarpredicatiu 0.0386740331491713 comparativitat 0 gradabilitat 0.0165745856353591 modificador_dreta 0.0220994475138122 modificador_esquerra 0.895027624309392 menys2_Adj 0.0497237569060773 menys2_Adv 0.00828729281767956 menys2_Conj 0.00552486187845304 menys2_Det 0.281767955801105 menys2_Esp 0.0441988950276243 menys2_Nom 0.0911602209944751 menys2_Num 0 menys2_PT 0.0607734806629834 menys2_Prep 0.187845303867403 menys2_Pron 0.0110497237569061
menys2_Verb 0.25414364640884 menys2_no 0.00552486187845304 menys1_Adj 0.00276243093922652 menys1_Adv 0.0110497237569061 menys1_Conj 0.0276243093922652 menys1_Det 0.0276243093922652 menys1_Esp 0 menys1_Nom 0.81767955801105 menys1_Num 0 menys1_PT 0.0110497237569061 menys1_Prep 0.00552486187845304 menys1_Verb 0.0524861878453039 menys1_no 0.0331491712707182 mes1_Adj 0.00828729281767956 mes1_Adv 0.0441988950276243 mes1_Conj 0.0718232044198895 mes1_Det 0.0386740331491713
mes1_Esp 0.0331491712707182 mes1_Nom 0.0267034990791897 mes1_PT 0.320441988950276 mes1_Prep 0.366482504604052 mes1_Pron 0.0220994475138122 mes1_Verb 0.0460405156537753 mes1_no 0.0220994475138122 mes2_Adj 0.0607734806629834 mes2_Adv 0.00552486187845304 mes2_Conj 0.0552486187845304 mes2_Det 0.276243093922652 mes2_Esp 0.069060773480663 mes2_Nom 0.160220994475138 mes2_PT 0.0718232044198895 mes2_Prep 0.124309392265193 mes2_Pron 0.0303867403314917 mes2_Verb 0.140883977900552
predicativitycomparativity
gradabilityright modifierleft modifier
back
the matrix0 0 0 0 0 0.0833333333333333 0 0 0.75 0 0 0.166666666666667 0 0 0 0 0 0 0 0 0 10.2 0 0.0666666666666667 0.0666666666666667 0 0 0.0666666666666667 0 0.4666666660.384615384615385 0 0.153846153846154 0 0 0 0 0.0769230769230769 0.23076923076920.11 0 0.075 0.015 0.005 0.12 0.04 0.04 0.33 0.13 0.01 0.15 0.01 0.06 0 0.01 0.10 0 0 0 0 0 0.0133333333333333 0 0.88 0 0 0.0933333333333333 0 0.0133333333333330.192307692307692 0.0384615384615385 0.0384615384615385 0 0 0.0769230769230769 00.117647058823529 0 0.0784313725490196 0 0 0.196078431372549 0.01960784313725490 0 0 0.0789473684210526 0.0263157894736842 0.105263157894737 0 0 0.3684210526310 0 0 0 1 0.0588235294117647 0 0.294117647058824 0 0 0 0 0 0.588235294117647 0.00.0952380952380952 0.0476190476190476 0.0476190476190476 0.0476190476190476 0.040 0 0.0681818181818182 0 0 0.204545454545455 0.0681818181818182 0 0.2727272727270.0769230769230769 0 0 0 0 0.230769230769231 0 0 0.461538461538462 0 0 0.23076920.04 0 0.08 0 0 0.28 0 0 0.4 0.04 0 0.04 0 0.08 0 0 0.16 0 0.12 0.2 0.2 0.36 0 00.293333333333333 0 0.04 0.0133333333333333 0 0.08 0.0133333333333333 0.026666660.133333333333333 0 0 0 0 0.133333333333333 0 0.0666666666666667 0.46666666666660 0 0 0.0909090909090909 0.0909090909090909 0.181818181818182 0 0.090909090909090.0434782608695652 0 0.130434782608696 0.0434782608695652 0 0.130434782608696 00.104166666666667 0 0.0625 0 0.0208333333333333 0.0208333333333333 0.0625 0.10410.0526315789473684 0 0.105263157894737 0.0526315789473684 0 0 0.105263157894737
back
CLUTO(v. 1.5.1, Karypis 2002)
• high dimensional datasets• analysis of cluster features• partitional or agglomerative algorithms• various criterion functions, taking into
account similarity within the objects in a cluster (internal criterion) and/or the differences between objects of different clusters (external criterion)
back
partitional
combination of
internal and external
criteria
human gold standardinter-judge agreement
E (judge 1) NQ (judge 1) Q (judge 1) R (judge 1)
R (judge 2)
Q (judge 2)
NQ (judge 2)
E (judge 2)
back
contextual attributes vs. gold standard
0 1 2 3 4
R
Q
NQ
N
agreement with semantic attributes
contextual attributes vs. gold standard
0 1 2 3 4
R
Q
NQ
N
human
preceding common noun (7%),
following specifier (7%)
preceding punctuation
(40%), following adverb or
verb
not preceding punctuation,
following common noun
(50%)
agreement with semantic attributes
following common noun
(50%), following specifier (34%)
preceding preposition
(30%), preceding specifier (25%)
customized valuesgradability and comparativity normalized to binary
0 1 2 3 4
R
Q
NQ
N
gradability (65%), comparativity
(12%)
after common noun (11%), comparativity
(13%), gradability (61%)
comparativity (14%), gradability (61%)
back
gradability (61%), after common noun (10%), followed by common noun (2%)
comparativity (12%), followed by common
noun (2%), after common noun (11%)
customized valuesgradability and comparativity normalized to binary
N NQ Q R
4
3
2
1
0
back
interpretation of resultsquality of cluster solution
• tightness of obtained clusters– objects within a cluster are very similar to each
other– objects are very dissimilar to objects in
different clusters
• attribute distribution: different values across clusters evidence discriminating function of attributes
tightness of clustering solutions
00,10,20,30,40,50,60,70,80,9
Isim ISdev Esim Esdev
back
attribute distribution across clusters
com
pa
rativ
ity
be
fore
ad
ject
ive
afte
r
ad
ject
ive
gra
da
bili
ty
afte
r n
ou
n
be
fore
no
un
pre
dic
ativ
ity
attribute distribution across clusters
com
pa
rativ
ity
be
fore
ad
ject
ive
afte
ra
dje
ctiv
e
gra
da
bili
ty
afte
r n
ou
n
be
fore
no
un
pre
dic
ativ
ity
back
attribute distribution across clusters
afte
r n
oun
pre
po.
aft
er
-2 s
pec
.
nou
n a
fter
pre
dic
ativ
ity
back
back to interpretation
decision list
• a gold standard annotated by human judges
• a gold standard built with a decision list– deductive classification: using some of the
attributes in the vectors for classifying adjectives into pre-defined classes:• predicativity• position with respect to the head noun• gradability and comparativity
fully automatic: inexpensive but unsupervised
decision list vs.human gold standard
N(decisionlist)
NQ(decisionlist)
Q(decisionlist)
R(decisionlist)
R (golden)
Q (golden)
NQ (golden)
N (golden)
a deductive approach does not
provide a good solution