automatic paraphrasing of human intransitive adjectives in portuguese
TRANSCRIPT
technology from seed"
Paraphrasing Human Intransitive Adjective
Constructions in Port4NooJ
CRISTINA MOTA1
PAULA CARVALHO1,2
FRANCISCO RAPOSO1
ANABELA BARREIRO1
1 INESC-‐ID, Lisbon 2 Universidade Europeia | Laureate International Universities
International NooJ 2015 Conference � Minsk, 13 June
eSPERTo – System for Paraphrasing in Editing and Revision of Texts
• Main objective – Design and development of a linguistically enhanced paraphrase generator
• Semantico-‐syntactic and multiword units • Sensitive to context
• Method – Hybrid system, combining statistics and linguistic knowledge to identify and generate new and
more complex paraphrases
– Exploitation of existing paraphrasing resources • Web platform
– Interactive application to help Portuguese language learners in producing and revising their texts
– Text-‐editing mechanisms which provide a variety of alternatives for each expression
– Users can choose or suggest expressions that can be immediately applied to their text
– Support to writing optimization, understandability and translatability
Introduction to the eSPERTo Project
2
Linguistic Resources
���
• Linguistic knowledge databases
Port4NooJ & Eng4Nooj
• Originally (English-‐Portuguese) OpenLogos resources (http://logos-os.dfki.de/)
• Converted into NooJ format
• Enhanced with new properties, including derivational and morpho-‐syntactic and semantic relations
Earlier versions
���
• Phrasal verbs into equivalent expressions
– to clear up (weather) = (weather) to become better/brighter
• Support verb constructions into single verbs
– to make a decision = to decide – to make a presentation of = present – to give support to N(AN) = to support N(AN) – to get into contact with = to contact – to become acid = to acidify
• Support verb constructions into their stylistic variants
– to make an audit = to perform an audit – to make an impression = to create an impression
• Aspectual constructions into single verbs
– to launch an attack = to attack
Earlier versions
���
• Adverbs (compounds into single adverbs)
– in a constructive way = constructively – on purpose > purposely = deliberately
• Relatives into participial adjectives
– the president that was elected = the president elect
• Relatives into possessives
– the role that Europe plays/has = the role of Europe – the position that the Church defends = the position of the Church
• Relatives into compound nouns (and vice-‐versa)
– a container for the milk = a milk container – a bottle made of plastic = a plastic bottle
• Agentive passives into actives
– the young man is released by the police ofZicer = the police ofZicer releases the young man
eSPERTo Architecture
6
eSPERTo online
Combine Text and
suggestions
Input Text +
Resource selection
noojapply + STRING
Parahrase suggestions Port4NooJ
Dictionaries Grammars
Eng4Nooj ... Ital4NooJ Fren4NooJGer4NooJ Spa4NooJ
Ling
uist
Va
lidat
ion
Hybrid Paraphrase Acquisition
User feedback Dictionaries Grammars
eSPERTo: noojapply Integration
9
noojapply pt result.ind lr.no(d|m)* sr.nog* REESCREVE.nog text.txt
eSPERTo Web Interface User configuration
eSPERTo: noojapply Integration
10
noojapply pt result.ind lr.no(d|m)* sr.nog* REESCREVE.nog text.txt
eSPERTo Web Interface User configuration
eSPERTo: noojapply Integration
11
noojapply pt result.ind lr.no(d|m)* sr.nog* REESCREVE.nog text.txt
eSPERTo Web Interface User configuration
eSPERTo Web Interface Result presentation
teste.txt:0,17,O homem que é americano teste.txt:0,17,O homem da América teste.txt:0,17,O homem de nacionalidade americana teste.txt:0,17,O homem de naturalidade americana teste.txt:0,17,O homem de origem americana teste.txt:0,39,o trabalho foi apresentado pelo homem americano teste.txt:18,10,efectuar apresentação teste.txt:18,10,fazer apresentação teste.txt:18,10,realizar apresentação
eSPERTo: noojapply integration
12
the man who is American the man from America the man with American nationality …
The American man
https://esperto.l2f.inesc-id.pt/esperto/esperto/demo.pl
LG of Portuguese Human Intransitive Adjectives
13
• eSPERTo was enhanced with new paraphrases, derived from 15 Lexicon-‐Grammar (LG) tables describing the distributional properties of 4,250 human intransitive adjectives (HIA)
(Carvalho, 2008):
– Syntactic and semantic nature of the subject modiZied by each adjective;
– Copulative verbs selected by each adjective;
– Constraints on the quantiZication of adjectives by an adverb or a degree morpheme;
– Position of adjectives in adnominal context;
– Optional adjective “complements”;
– Generic NP and cross-‐constructions, where the adjective Zills the head of a noun phrase;
– Characterizing indeZinite constructions, where the adjective occurs after an indeZinite article;
– Exclamative sentences expressing insult.
Adjective Selection
CETEMPublico Adj 17.300 lemmas
Predicate Adj 13.875 lemmas
Adj Intrans Hum 4.250 lemmas
Adj Doen 187
Adj Filo 303
Adj Nac 651
Adj Hum 3.109
Lookup with LabEL lexical resources (LABEL-LEX) (Ranchhod et al. 2004)
Pre-selection and classification of Adj according to the linguistic criteria defined in Carvalho (2001)
Nhum Vcop Adj
Hum Adj SubclassiKication Criteria ADJ HUM
SER SER + ESTAR
N0 ser Adj N0 ser um Adj
N0=: Nhum Nap de Nhum
QueF
N0=: Nhum
N0=: Nhum Nap de Nhum
N0=: Nhum Nap de Nhum
QueF
N0=: Nhum
N0=: Nhum Nap de Nhum
ESTAR
N0 (ser+estar) Adj N0 ser um Adj N0 estar Adj
N0=: Nhum Nap de Nhum
N0=: Nhum
N0=: Nhum Nap de Nhum
N0=: Nhum
N0=: Nhum Nap de Nhum
N0=: Nhum
Hum Adj SubclassiKication Criteria ADJ HUM
SER SER + ESTAR
N0 ser Adj N0 ser um Adj
ESTAR
N0 (ser+estar) Adj N0 ser um Adj N0 estar Adj
SAHP1 inteligente
SAHP2 atlético
SAHP3 culto
SAHC1 idiota
SAHC2 sedutor
SAHC3 inculto
EAHP2 abatido
EAHP3 zangado
SEAHP2 bonito
SEAHP3 velho
SEAHC2 gordo
SEAHC3 bêbado
• Adjective, noun and verb morphologically related constructions
– está zangado (is angry) = zangou-‐se (got (self) angry) = esteve envolvido numa zanga (was involved in anger)
• Adjective constructions supported by different copulative verbs
– estar perdido (to be lost) = andar perdido (walk around lost)
• Constructions involving nationality and other membership relations
– (de origem portuguesa (of Portuguese origin/roots) = portugueses (Portuguese) = de Portugal (from Portugal)
– ben<iquista (Ben<ica fan) = do Sport Lisboa e Ben<ica (a fan of Sport Lisboa e Ben<ica)
• Cross-‐constructions
– o idiota do rapaz (the idiot of the boy) = o rapaz é um idiota (the boy is an idiot)
• Appropriate noun constructions
– foi moderado nos seus comentários (he was moderated in his comments) = os seus comentários foram moderados (his comments were moderated) = foi moderado (he was moderated)
• Generic noun phrases
– é um indivíduo estúpido (he is a fool) = é um estúpido (he is a fool) = é estúpido (he is a fool)
New Transformations
18
– From LG tables to NooJ dictionaries • Mostly done automatically with different scripts
Integration of LG of Portuguese Human Intransitive Adjectives
19
Port4NooJ
LG tables
Adjectivos_IH
ü If adjective in Port4NooJ merge the LG properties with dictionary entry else create new entry
ü Create FLX and DRV codes and corresponding rules as needed
ü Check for missing FLX and DRV codes
– From LG tables to NooJ dictionaries • Representation of LG table properties
Integration of LG of PT HIA
20
+Top=Abissínia +TopDET=a
+NclassPnacionalidade +NAdj
+Vcopser
+IH +Table=SAN
– From LG tables to NooJ dictionaries • Representation of LG table properties
Integration of LG of PT HIA
21
+Top=Abissínia +TopDET=a
+NclassPnacionalidade +NAdj
+Vcopser
+IH +Table=SAN
Determined automatically by consulting AC/DC corpora o homem abissínio ó o homem da Abissínia o homem açoriano ó o homem dos Açores o homem português ó o homem de Portugal
– From LG tables to NooJ dictionaries • Representation of LG table properties
Integration of LG of PT HIA
22
+IH +Table=SEAHP3
+Nome=alegria
+Verbo=alegrar-se
+Nnhum
– From LG tables to NooJ dictionaries • Representation of LG table properties
Integration of LG of PT HIA
23
+IH +Table=SEAHP3
+DRV=A2N143:CASA
+DRV=A2V6:FALAR +Reflexivo
+Nnhum
– From LG tables to NooJ dictionaries • Representation of LG table properties
Integration of LG of PT HIA
24
+IH +Table=SEAHP3
+DRV=A2N143:CASA
+DRV=A2V6:FALAR +Reflexivo • DRV code is determined and formalized automatically by finding
the radical between the adjective and the noun or verb alegr(ia) => A2N143 = <B1>ia/N alegr(ar) => A2V6 = <B1>ar/V
• FLX code is determined by consulting Port4NooJ alegria,N+FLX=CASA+AB+state+EN=joy+SYNN=contentamento alegrar,V+FLX=FALAR+Aux=1+PRECVagree-type+Subset=…
If the derived form does not exist, then its code is assigned automatically
+Nnhum
velho,A+FLX=ALTO+AB+class+EN=vintage velho,A+FLX=ALTO+AN+Hum+EN=elder velho,A+FLX=ALTO+NAV+Apred+EN=old
+IH +Table=SEAHP3 +Nhum+Vcopser+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse +UMNclas +UmModif +AdvQuant +Superlativo +Nadj +DRV=A2N164:CASA+DRV=A2V67:AGRADECER
– From LG tables to NooJ dictionaries • Integration with eSPERTo dictionary entries
① Adjective exists in Port4NooJ
ü Port4NooJ entries blindly receive the additional properties as speciZied by the LG tables q In a second round, discard at least entries marked with +AB
Integration of LG of PT HIA
25
velho,A+FLX=ALTO+AB+class+EN=vintage+IH+Table=SEAHP3+Nhum+Vcopse+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse+UMNclas+UmModif+AdvQuant+Superlativo+NAdj+DRV=A2N164:CASA+DRV=A2V67:AGRADECER velho,A+FLX=ALTO+AN+Hum+EN=elder+IH+Table=SEAHP3+Nhum+Vcopser+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse+UMNclas+UmModif+AdvQuant+Superlativo+NAdj+DRV=A2N164:CASA+DRV=A2V67:AGRADECER velho,A+FLX=ALTO+NAV+Apred+EN=old+IH+Table=SEAHP3+Nhum+Vcopser+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse+UMNclas+UmModif+AdvQuant+Superlativo+NAdj+DRV=A2N164:CASA+DRV=A2V67:AGRADECER
– From LG tables to NooJ dictionaries • Integration with eSPERTo dictionary entries
① Adjective exists in Port4NooJ
ü Port4NooJ entries blindly receive the additional properties as speciZied by the LG tables q In a second round, discard at least entries marked with +AB
Integration of LG of PT HIA
26
velho,A+FLX=ALTO+AB+class+EN=vintage+IH+Table=SEAHP3+Nhum+Vcopse+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse+UMNclas+UmModif+AdvQuant+Superlativo+NAdj+DRV=A2N164:CASA+DRV=A2V67:AGRADECER velho,A+FLX=ALTO+AN+Hum+EN=elder+IH+Table=SEAHP3+Nhum+Vcopser+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse+UMNclas+UmModif+AdvQuant+Superlativo+NAdj+DRV=A2N164:CASA+DRV=A2V67:AGRADECER velho,A+FLX=ALTO+NAV+Apred+EN=old+IH+Table=SEAHP3+Nhum+Vcopser+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse+UMNclas+UmModif+AdvQuant+Superlativo+NAdj+DRV=A2N164:CASA+DRV=A2V67:AGRADECER
– From LG tables to NooJ dictionaries • Integration with eSPERTo dictionary entries
① Adjective exists in Port4NooJ
ü Port4NooJ entries blindly receive the additional properties as speciZied by the LG tables q In a second round, discard at least entries marked with +AB
Integration of LG of PT HIA
27
– From LG tables to NooJ dictionaries • Integration with eSPERTo dictionary entries
② Adjective not in Port4NooJ (or in but is derived from another entry):
ü FLX code is assigned automatically given the ending of the word ü Entries are checked for missing FLX codes and reviewed by a linguist ü All other properties come from LG table
abissínio,A+FLX=ALTO+IH+Table=SAN+Nhum+Vcopser+Vcoptornarse+UMNclas+UmModif+NclassPserde+NclassPorigem+NclassPnacionalidade+NclassPnaturalidade+NAdj+Top=Abissínia+TopDET=a
(no entry in Port4Nooj)
arranhado,A+FLX=ALTO+IH+Table=EAHP2+Nhum+NapdeNhum+Npc+Vcopestar+AdvQuant+Superlativo+NAdj+NhumVopAPrepNap+deemEDefNap+DRV=A2N4:BALÃO+DRV=A2V2:FALAR+Reflexivo
(In Port4Nooj: arranhar,V+FLX=FALAR...) solteiro,A+FLX=ALTO+IH+Table=SEAHP3+Nhum+Vcopser+Vcopestar+Vcopficar+Vcoppermanecer+Vcopencontrarse+UMNclas+UmModif+Superlativo+NAdj (In Port4NooJ: solteiro,N+FLX=ANO+AN+des+EN=bachelor)
Integration of LG of PT HIA
28
– From LG to NooJ grammars • Option 1: Syntactic Parsing
Integration of LG of PT HIA
30
Input1
Output1
o homem é tonto <REESCREVE+TEXTO=é um tonto> é tonto </REESCREVE>
– From LG to NooJ grammars • Option 1: Syntactic Parsing
Integration of LG of PT HIA
31
Input2
Output2
o homem é um tonto <REESCREVE+TEXTO=é tonto> é um tonto </REESCREVE>
– From LG to NooJ grammars • Option 2: Transformational module
Integration of LG of PT HIA
33
Input1
Input2
ó
é tonto, REESCREVE+Cpred é um tonto, REESCREVE+CCI
Preliminary Results
34
• 5 150 human intransitive adjectives
• 677 new derivational paradigms
• Example grammars for the syntactic parser and the transformational module
• 50% increase in Port4NooJ adjective entries
Preliminary Results
35
• 5 150 human intransitive adjectives
• 677 new derivational paradigms
• Example grammars for the syntactic parser and the transformational module
• 50% increase in Port4NooJ adjective entries Table Example In,Port4NooJ New %,InSAHP1 inteligente 303 247 55%SAHP2 atlético 142 226 39%SEAHP2 bonito 53 87 38%SAHC1 idota 115 229 33%SAHP3 culto 97 263 27%SEAHP3 velho 32 93 26%SEAHC2 gordo 14 41 25%SAF anarquista 70 234 23%SEAHC3 bêbado 15 53 22%SEAD leproso 39 149 21%EAHP3 zangado 54 213 20%SAHC2 sedutor 41 177 19%EAHP2 abatido 18 87 17%SAN americano 108 544 17%SAHC3 inculto 54 465 10%Total 1155 3108 26%
• Complete the integration of the LG of human intransitive adjectives
– By creating all grammars to process the constructions formalized in LG
• Revise and evaluate the new resources
• Integrate and adapt additional LG grammars:
– Constructions with Vsup ser de (Baptista, 2005) – Constructions with Vsup fazer (Chacoto, 2005)
• Use the grammar paraphrase knowledge to create a corpus of paraphrases to develop eSPERTo hybrid paraphrase acquisition engine
– Train machine learning paraphrase acquisition system – Annotate semantico-‐syntactic and multiword paraphrases in corpora to use in training and evaluation
– Merge of paraphrases collected statistically
Next Steps
36