automatic paraphrasing of human intransitive adjectives in portuguese

37
technology from seed Paraphrasing Human Intransitive Adjective Constructions in Port4NooJ CRISTINA MOTA 1 PAULA CARVALHO 1,2 FRANCISCO RAPOSO 1 ANABELA BARREIRO 1 1 INESCID, Lisbon 2 Universidade Europeia | Laureate International Universities International NooJ 2015 Conference Minsk, 13 June

Upload: inesc-id-university-of-lisbon

Post on 30-Jul-2015

151 views

Category:

Technology


1 download

TRANSCRIPT

technology from seed"

 

 Paraphrasing  Human  Intransitive  Adjective  

 Constructions  in  Port4NooJ    

 CRISTINA  MOTA1  

 PAULA  CARVALHO1,2  

 

FRANCISCO  RAPOSO1      

ANABELA  BARREIRO1  

1  INESC-­‐ID,  Lisbon  2  Universidade  Europeia  |  Laureate  International  Universities  

International NooJ 2015 Conference � Minsk, 13 June

eSPERTo  –  System  for  Paraphrasing  in  Editing  and  Revision  of  Texts  

•  Main  objective  –  Design  and  development  of  a  linguistically  enhanced  paraphrase  generator  

•  Semantico-­‐syntactic  and  multiword  units  •  Sensitive  to  context  

•  Method  –  Hybrid  system,  combining  statistics  and  linguistic  knowledge  to  identify  and  generate  new  and  

more  complex  paraphrases  

–  Exploitation  of  existing  paraphrasing  resources    •  Web  platform  

–  Interactive  application  to  help  Portuguese  language  learners  in  producing  and  revising  their  texts  

–  Text-­‐editing  mechanisms  which  provide  a  variety  of  alternatives  for  each  expression  

–  Users  can  choose  or  suggest  expressions  that  can  be  immediately  applied  to  their  text  

–  Support  to  writing  optimization,  understandability  and  translatability  

 

Introduction  to  the  eSPERTo  Project  

2  

Linguistic Resources  

���

•  Linguistic  knowledge  databases  

Port4NooJ  &  Eng4Nooj  

•  Originally  (English-­‐Portuguese)  OpenLogos  resources  (http://logos-os.dfki.de/)

•  Converted  into  NooJ  format  

•  Enhanced   with   new   properties,   including   derivational   and   morpho-­‐syntactic   and  semantic  relations  

 

Earlier  versions  

���

•  Phrasal  verbs  into  equivalent  expressions  

–  to  clear  up  (weather)  =  (weather)  to  become  better/brighter  

•  Support  verb  constructions  into  single  verbs  

–  to  make  a  decision  =  to  decide    –  to  make  a  presentation  of  =  present  –  to  give  support  to  N(AN)  =  to  support  N(AN)  –  to  get  into  contact  with  =  to  contact  –  to  become  acid  =  to  acidify  

•  Support  verb  constructions  into  their  stylistic  variants  

–  to  make  an  audit  =  to  perform  an  audit  –  to  make  an  impression  =  to  create  an  impression  

•  Aspectual  constructions  into  single  verbs  

–  to  launch  an  attack  =  to  attack  

Earlier  versions  

���

•  Adverbs  (compounds  into  single  adverbs)  

–  in  a  constructive  way  =  constructively  –  on  purpose  >  purposely  =  deliberately  

•  Relatives  into  participial  adjectives  

–  the  president  that  was  elected  =  the  president  elect  

•  Relatives  into  possessives  

–  the  role  that  Europe  plays/has  =  the  role  of  Europe  –  the  position  that  the  Church  defends  =  the  position  of  the  Church  

•  Relatives    into  compound  nouns  (and  vice-­‐versa)  

–  a  container  for  the  milk  =  a  milk  container  –  a  bottle  made  of  plastic  =  a  plastic  bottle  

•  Agentive  passives  into  actives  

–  the  young  man  is  released  by  the  police  ofZicer    =  the  police  ofZicer  releases  the  young  man  

 

eSPERTo  Architecture  

6  

eSPERTo online

Combine Text and

suggestions

Input Text +

Resource selection

noojapply + STRING

Parahrase suggestions Port4NooJ

Dictionaries Grammars

Eng4Nooj ... Ital4NooJ Fren4NooJGer4NooJ Spa4NooJ

Ling

uist

Va

lidat

ion

Hybrid Paraphrase Acquisition

User feedback Dictionaries Grammars

eSPERTo:  noojapply  Integration  

7  

https://esperto.l2f.inesc-id.pt/esperto/esperto/demo.pl

eSPERTo:  noojapply  Integration  

8  

eSPERTo Web Interface User configuration

eSPERTo:  noojapply  Integration  

9  

noojapply pt result.ind lr.no(d|m)* sr.nog* REESCREVE.nog text.txt

eSPERTo Web Interface User configuration

eSPERTo:  noojapply  Integration  

10  

noojapply pt result.ind lr.no(d|m)* sr.nog* REESCREVE.nog text.txt

eSPERTo Web Interface User configuration

eSPERTo:  noojapply  Integration  

11  

noojapply pt result.ind lr.no(d|m)* sr.nog* REESCREVE.nog text.txt

eSPERTo Web Interface User configuration

eSPERTo Web Interface Result presentation

teste.txt:0,17,O homem que é americano teste.txt:0,17,O homem da América teste.txt:0,17,O homem de nacionalidade americana teste.txt:0,17,O homem de naturalidade americana teste.txt:0,17,O homem de origem americana teste.txt:0,39,o trabalho foi apresentado pelo homem americano teste.txt:18,10,efectuar apresentação teste.txt:18,10,fazer apresentação teste.txt:18,10,realizar apresentação

eSPERTo:  noojapply  integration  

12  

the man who is American the man from America the man with American nationality …

The American man

https://esperto.l2f.inesc-id.pt/esperto/esperto/demo.pl

LG  of  Portuguese  Human  Intransitive  Adjectives  

13  

•  eSPERTo  was  enhanced  with  new  paraphrases,  derived  from  15  Lexicon-­‐Grammar  (LG)  tables  describing  the  distributional  properties  of  4,250  human  intransitive  adjectives  (HIA)

(Carvalho,  2008):  

–  Syntactic  and  semantic  nature  of  the  subject  modiZied  by  each  adjective;  

–  Copulative  verbs  selected  by  each  adjective;  

–  Constraints  on  the  quantiZication  of  adjectives  by  an  adverb  or  a  degree  morpheme;  

–  Position  of  adjectives  in  adnominal  context;  

–  Optional  adjective  “complements”;  

–  Generic  NP  and  cross-­‐constructions,  where  the  adjective  Zills  the  head  of  a  noun  phrase;  

–  Characterizing  indeZinite  constructions,  where  the  adjective  occurs  after  an  indeZinite  article;  

–  Exclamative  sentences  expressing  insult.  

Adjective  Selection  

CETEMPublico Adj 17.300 lemmas

Predicate Adj 13.875 lemmas

Adj Intrans Hum 4.250 lemmas

Adj Doen 187

Adj Filo 303

Adj Nac 651

Adj Hum 3.109

Lookup with LabEL lexical resources (LABEL-LEX) (Ranchhod et al. 2004)

Pre-selection and classification of Adj according to the linguistic criteria defined in Carvalho (2001)

Nhum Vcop Adj

Hum  Adj  SubclassiKication  Criteria    ADJ HUM

SER SER + ESTAR

N0 ser Adj N0 ser um Adj

N0=: Nhum Nap de Nhum

QueF

N0=: Nhum

N0=: Nhum Nap de Nhum

N0=: Nhum Nap de Nhum

QueF

N0=: Nhum

N0=: Nhum Nap de Nhum

ESTAR

N0 (ser+estar) Adj N0 ser um Adj N0 estar Adj

N0=: Nhum Nap de Nhum

N0=: Nhum

N0=: Nhum Nap de Nhum

N0=: Nhum

N0=: Nhum Nap de Nhum

N0=: Nhum

Hum  Adj  SubclassiKication  Criteria    ADJ HUM

SER SER + ESTAR

N0 ser Adj N0 ser um Adj

ESTAR

N0 (ser+estar) Adj N0 ser um Adj N0 estar Adj

SAHP1 inteligente

SAHP2 atlético

SAHP3 culto

SAHC1 idiota

SAHC2 sedutor

SAHC3 inculto

EAHP2 abatido

EAHP3 zangado

SEAHP2 bonito

SEAHP3 velho

SEAHC2 gordo

SEAHC3 bêbado

LG  Tables  (EAHP3)  

17  

•  Adjective,  noun  and  verb  morphologically  related  constructions  

–  está  zangado  (is  angry)  =  zangou-­‐se  (got  (self)  angry)  =  esteve  envolvido  numa  zanga  (was  involved  in  anger)  

•  Adjective  constructions  supported  by  different  copulative  verbs    

–  estar  perdido  (to  be  lost)  =  andar  perdido  (walk  around  lost)  

•  Constructions  involving  nationality  and  other  membership  relations    

–  (de  origem  portuguesa  (of  Portuguese  origin/roots)  =  portugueses  (Portuguese)  =  de  Portugal  (from  Portugal)    

–  ben<iquista  (Ben<ica  fan)  =  do  Sport  Lisboa  e  Ben<ica  (a  fan  of  Sport  Lisboa  e  Ben<ica)  

•  Cross-­‐constructions    

–  o  idiota  do  rapaz  (the  idiot  of  the  boy)  =  o  rapaz  é  um  idiota  (the  boy  is  an  idiot)    

•  Appropriate  noun  constructions    

–  foi  moderado  nos  seus  comentários  (he  was  moderated  in  his  comments)  =  os  seus  comentários  foram  moderados  (his  comments  were  moderated)  =  foi  moderado  (he  was  moderated)  

•  Generic  noun  phrases    

–  é  um  indivíduo  estúpido  (he  is  a  fool)  =  é  um  estúpido  (he  is  a  fool)  =  é  estúpido  (he  is  a  fool)  

New  Transformations  

18  

–  From  LG  tables  to  NooJ  dictionaries  •  Mostly  done  automatically  with  different  scripts  

Integration  of  LG  of  Portuguese  Human  Intransitive  Adjectives  

19  

Port4NooJ

LG tables

Adjectivos_IH

ü  If adjective in Port4NooJ merge the LG properties with dictionary entry else create new entry

ü  Create FLX and DRV codes and corresponding rules as needed

ü  Check for missing FLX and DRV codes

–  From  LG  tables  to  NooJ  dictionaries  •  Representation  of  LG  table  properties  

Integration  of  LG  of  PT  HIA  

20  

+Top=Abissínia +TopDET=a

+NclassPnacionalidade +NAdj

+Vcopser

+IH +Table=SAN

–  From  LG  tables  to  NooJ  dictionaries  •  Representation  of  LG  table  properties  

Integration  of  LG  of  PT  HIA  

21  

+Top=Abissínia +TopDET=a

+NclassPnacionalidade +NAdj

+Vcopser

+IH +Table=SAN

Determined automatically by consulting AC/DC corpora o homem abissínio ó o homem da Abissínia o homem açoriano ó o homem dos Açores o homem português ó o homem de Portugal

–  From  LG  tables  to  NooJ  dictionaries  •  Representation  of  LG  table  properties  

Integration  of  LG  of  PT  HIA  

22  

+IH +Table=SEAHP3

+Nome=alegria

+Verbo=alegrar-se

+Nnhum

–  From  LG  tables  to  NooJ  dictionaries  •  Representation  of  LG  table  properties  

Integration  of  LG  of  PT  HIA  

23  

+IH +Table=SEAHP3

+DRV=A2N143:CASA

+DRV=A2V6:FALAR +Reflexivo

+Nnhum

–  From  LG  tables  to  NooJ  dictionaries  •  Representation  of  LG  table  properties  

Integration  of  LG  of  PT  HIA  

24  

+IH +Table=SEAHP3

+DRV=A2N143:CASA

+DRV=A2V6:FALAR +Reflexivo •  DRV code is determined and formalized automatically by finding

the radical between the adjective and the noun or verb alegr(ia) => A2N143 = <B1>ia/N alegr(ar) => A2V6 = <B1>ar/V

•  FLX code is determined by consulting Port4NooJ alegria,N+FLX=CASA+AB+state+EN=joy+SYNN=contentamento alegrar,V+FLX=FALAR+Aux=1+PRECVagree-type+Subset=…

If the derived form does not exist, then its code is assigned automatically

+Nnhum

velho,A+FLX=ALTO+AB+class+EN=vintage velho,A+FLX=ALTO+AN+Hum+EN=elder velho,A+FLX=ALTO+NAV+Apred+EN=old

+IH +Table=SEAHP3 +Nhum+Vcopser+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse +UMNclas +UmModif +AdvQuant +Superlativo +Nadj +DRV=A2N164:CASA+DRV=A2V67:AGRADECER

–  From  LG  tables  to  NooJ  dictionaries  •  Integration  with  eSPERTo  dictionary  entries  

 ①  Adjective  exists  in  Port4NooJ  

ü  Port4NooJ  entries  blindly  receive  the  additional  properties  as  speciZied  by  the  LG  tables  q  In  a  second  round,  discard  at  least  entries  marked  with  +AB  

 

Integration  of  LG  of  PT  HIA  

25  

velho,A+FLX=ALTO+AB+class+EN=vintage+IH+Table=SEAHP3+Nhum+Vcopse+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse+UMNclas+UmModif+AdvQuant+Superlativo+NAdj+DRV=A2N164:CASA+DRV=A2V67:AGRADECER velho,A+FLX=ALTO+AN+Hum+EN=elder+IH+Table=SEAHP3+Nhum+Vcopser+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse+UMNclas+UmModif+AdvQuant+Superlativo+NAdj+DRV=A2N164:CASA+DRV=A2V67:AGRADECER velho,A+FLX=ALTO+NAV+Apred+EN=old+IH+Table=SEAHP3+Nhum+Vcopser+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse+UMNclas+UmModif+AdvQuant+Superlativo+NAdj+DRV=A2N164:CASA+DRV=A2V67:AGRADECER

–  From  LG  tables  to  NooJ  dictionaries  •  Integration  with  eSPERTo  dictionary  entries  

 ①  Adjective  exists  in  Port4NooJ  

ü  Port4NooJ  entries  blindly  receive  the  additional  properties  as  speciZied  by  the  LG  tables  q  In  a  second  round,  discard  at  least  entries  marked  with  +AB  

 

Integration  of  LG  of  PT  HIA  

26  

velho,A+FLX=ALTO+AB+class+EN=vintage+IH+Table=SEAHP3+Nhum+Vcopse+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse+UMNclas+UmModif+AdvQuant+Superlativo+NAdj+DRV=A2N164:CASA+DRV=A2V67:AGRADECER velho,A+FLX=ALTO+AN+Hum+EN=elder+IH+Table=SEAHP3+Nhum+Vcopser+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse+UMNclas+UmModif+AdvQuant+Superlativo+NAdj+DRV=A2N164:CASA+DRV=A2V67:AGRADECER velho,A+FLX=ALTO+NAV+Apred+EN=old+IH+Table=SEAHP3+Nhum+Vcopser+Vcopestar+Vcopencontrarse+Vcopsentirse+Vcoptornarse+UMNclas+UmModif+AdvQuant+Superlativo+NAdj+DRV=A2N164:CASA+DRV=A2V67:AGRADECER

–  From  LG  tables  to  NooJ  dictionaries  •  Integration  with  eSPERTo  dictionary  entries  

 ①  Adjective  exists  in  Port4NooJ  

ü  Port4NooJ  entries  blindly  receive  the  additional  properties  as  speciZied  by  the  LG  tables  q  In  a  second  round,  discard  at  least  entries  marked  with  +AB  

 

Integration  of  LG  of  PT  HIA  

27  

–  From  LG  tables  to  NooJ  dictionaries  •  Integration  with  eSPERTo  dictionary  entries  

 ②  Adjective  not  in  Port4NooJ  (or  in  but  is  derived  from  another  entry):    

ü  FLX  code  is  assigned  automatically  given  the  ending  of  the  word  ü  Entries  are  checked  for  missing  FLX  codes  and  reviewed  by  a  linguist  ü  All  other  properties  come  from  LG  table  

abissínio,A+FLX=ALTO+IH+Table=SAN+Nhum+Vcopser+Vcoptornarse+UMNclas+UmModif+NclassPserde+NclassPorigem+NclassPnacionalidade+NclassPnaturalidade+NAdj+Top=Abissínia+TopDET=a

(no  entry  in  Port4Nooj)  

arranhado,A+FLX=ALTO+IH+Table=EAHP2+Nhum+NapdeNhum+Npc+Vcopestar+AdvQuant+Superlativo+NAdj+NhumVopAPrepNap+deemEDefNap+DRV=A2N4:BALÃO+DRV=A2V2:FALAR+Reflexivo

(In  Port4Nooj:  arranhar,V+FLX=FALAR...)    solteiro,A+FLX=ALTO+IH+Table=SEAHP3+Nhum+Vcopser+Vcopestar+Vcopficar+Vcoppermanecer+Vcopencontrarse+UMNclas+UmModif+Superlativo+NAdj  (In  Port4NooJ:  solteiro,N+FLX=ANO+AN+des+EN=bachelor)    

Integration  of  LG  of  PT  HIA  

28  

–  From  LG  to  NooJ  grammars  •  Option  1:  Syntactic  Parsing  

 

 

Integration  of  LG  of  PT  HIA  

29  

–  From  LG  to  NooJ  grammars  •  Option  1:  Syntactic  Parsing  

 

 

Integration  of  LG  of  PT  HIA  

30  

Input1

Output1

o homem é tonto <REESCREVE+TEXTO=é um tonto> é tonto </REESCREVE>

–  From  LG  to  NooJ  grammars  •  Option  1:  Syntactic  Parsing  

 

 

Integration  of  LG  of  PT  HIA  

31  

Input2

Output2

o homem é um tonto <REESCREVE+TEXTO=é tonto> é um tonto </REESCREVE>

–  From  LG  to  NooJ  grammars  •  Option  2:  Transformational  module  

 

Integration  of  LG  of  PT  HIA  

32  

–  From  LG  to  NooJ  grammars  •  Option  2:  Transformational  module  

 

Integration  of  LG  of  PT  HIA  

33  

Input1

Input2

ó

é tonto, REESCREVE+Cpred é um tonto, REESCREVE+CCI

Preliminary  Results  

34  

•  5  150  human  intransitive  adjectives      

•  677  new  derivational  paradigms  

•  Example  grammars  for  the  syntactic  parser  and  the  transformational  module  

•  50%  increase  in  Port4NooJ  adjective  entries  

Preliminary  Results  

35  

•  5  150  human  intransitive  adjectives      

•  677  new  derivational  paradigms  

•  Example  grammars  for  the  syntactic  parser  and  the  transformational  module  

•  50%  increase  in  Port4NooJ  adjective  entries   Table Example In,Port4NooJ New %,InSAHP1 inteligente 303 247 55%SAHP2 atlético 142 226 39%SEAHP2 bonito 53 87 38%SAHC1 idota 115 229 33%SAHP3 culto 97 263 27%SEAHP3 velho 32 93 26%SEAHC2 gordo 14 41 25%SAF anarquista 70 234 23%SEAHC3 bêbado 15 53 22%SEAD leproso 39 149 21%EAHP3 zangado 54 213 20%SAHC2 sedutor 41 177 19%EAHP2 abatido 18 87 17%SAN americano 108 544 17%SAHC3 inculto 54 465 10%Total 1155 3108 26%

•  Complete  the  integration  of  the  LG  of  human  intransitive  adjectives  

–  By  creating  all  grammars  to  process  the  constructions  formalized  in  LG  

•  Revise  and  evaluate  the  new  resources  

•  Integrate  and  adapt  additional  LG  grammars:    

–  Constructions  with  Vsup  ser  de  (Baptista,  2005)  –  Constructions  with  Vsup  fazer  (Chacoto,  2005)  

•  Use  the  grammar  paraphrase  knowledge  to  create  a  corpus  of  paraphrases  to  develop  eSPERTo  hybrid  paraphrase  acquisition  engine  

–  Train  machine  learning  paraphrase  acquisition  system  –  Annotate  semantico-­‐syntactic  and  multiword  paraphrases  in  corpora  to  use  in  training  and  evaluation  

–  Merge  of  paraphrases  collected  statistically  

Next  Steps  

36  

37  

Thank  you!    дзякуй!  

cmota|[email protected]  pcc|anabela.barreiro@inesc-­‐id.pt