clue-aligner: an alignment tool to annotate pairs of paraphrastic and translation units

14
technology from seed CLUE-Aligner An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units LREC - Portorož, May2 th 2016 ANABELA BARREIRO INESC-ID FRANCISCO RAPOSO INESC-ID / UTL TIAGO LUÍS VOICEINTERACTION

Upload: inesc-id-spoken-language-systems-laboratory-l2f

Post on 19-Feb-2017

190 views

Category:

Software


1 download

TRANSCRIPT

Page 1: CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

technologyfrom seed

CLUE-Aligner

An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

LREC - Portorož, May2th 2016

ANABELA BARREIROINESC-ID

FRANCISCO RAPOSOINESC-ID / UTL TIAGO

LUÍSVOICEINTERACTION

Page 2: CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

2

Alignment• Set of correspondences or relationships between linguistic

units which are semantico-syntactically related– Paraphrases (found within the same language = monolingual)

• EN: to make a distinction between | EN: to distinguish between– Translations (found in different languages = bilingual)

• EN: to keep it simple | PT: simplificar

Alignment task• NLP task that consists of the identification of translation or

paraphrastic relationships among those linguistic units (words, MWU or expressions) in sentence pairs that have been identified as paraphrases or translations of each other

Introduction

Page 3: CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

3

• Sure alignments correspond to expressions/translations that satisfy the criteria for optimum/full equivalence

• They are reciprocal – it is possible to translate the expression from the source to the target language and vice-versa• Optimum equivalence refers to the highest level of translation equivalence on

both linguistic and extra-linguistic levels (Bayar,2007)

• venture capital markets | mercados de capital de risco (S)• Possible alignments correspond to expressions/translations

that satisfy the criteria for approximate equivalence• They do not meet all of the requirements for absolute

equivalence. They are not reciprocal wrt source/target language• began | a vu le jour (P)

has seen the day

Sure and Possible Alignments

Page 4: CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

4

• Supervised learning uses high quality alignments, hand-made by linguists (Blunsom & Cohn, 2006; Ambati et al., 2010)– supervised methods take into consideration context, syntax

and other grammatical and sematic information• Guidelines for manual alignment:

– English–French - Blinker project (Melamed, 1998)– Czech–English (Kruijff-Korbayová et al., 2006; Bojar &

Prokopová, 2006)– Spanish–English (Lambert et al., 2005)– Paraphrase alignment guidelines (Callison-Burch et al. 2008)

Background

Page 5: CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

5

1. Lack of multilingual datasets– Publicly available alignments are mostly bilingual, with the

exception of 6 multilingual sets (Graça et al., 2008)

2. Lack of linguistically-motivated alignment guidelines – Previously proposed guidelines cover cross-linguistic

phenomena superficially, excluding important alignment challenges presented by discontiguous MWU (DMWU) and other non-adjacent linguistic phenomena or syntactic discontinuity (e.g., extraposition, topicalization, etc.)

3. Lack of tools – Tools are inefficient with DMWU and phrasal expressions

that are complex to align and require representation as non-contiguous block alignments

Current Shortcomings

Page 6: CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

6

– Alpaco - Blinker project (Rassier & Pedersen, 2003)

– ICA - Interactive Clue Aligner (Tiedemann, 2003; 2004; 2011)

*The "clue alignment approach” is based on mainly word-level alignment clues. Our approach is based on manual alignments of cross-language MWU and phrasal expressions -- that allows representing semantically equivalent non-adjacent structures, such as DMWU in translation and paraphrasing

– Yawat (Germann, 2008)

– SWIFT (Gilmanov et al., 2014)

– among others

Related Alignment Tools

Page 7: CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

7

• Web alignment interactive tool inspired in Linear-B (Callison-Burch & Bannard, 2004), (Callison-Burch, 2007)

• Allows the block-alignment of contiguous and DMWU• Uses a matrix visualization and a coloring schemes that help

distinguish between sure and possible alignments• Allows storage of pairs of paraphrastic units, with indication

of the place of insertions, represented by "[ ]" – I urge [ ] to | Exorto [ ] a– This feature is valuable in the construction of translation

rules or grammars and syntactic parsers that use those paraphrastic pairs, for which precision is important

– It is also important in ML to help learning constituents

CLUE* = Cross-Language Unit Elicitation

CLUE-Aligner

Page 8: CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

insertion

insertion

Black cells represent full/optimal semantic correspondenceGrey cells represent approximate semantic correspondence

Light orange cell groups represent unaligned P-insertionsDark orange cell groups represent unaligned S-insertions

Page 9: CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

pre-processing of contracted forms

still ainda

CLUE-Aligner Interface

Single Word Alignments and Block Alignments

Discontiguous Multiwordsand InsertionsLight green cell / cell groups represent aligned P-insertions

Dark green cell / cell groups represent aligned S-insertions

Page 10: CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

10

• Inspired by the Logos Model (Scott, 2003; Barreiro et al., 2011), which relies on deep semantico-syntactic analysis to translate contiguous and DMWU, often mistranslated by MT systems – have proven successful in commercial MT systems• to draw a distinction between• to bring [INSERTION] to a conclusion

• I would urge the European Commission to bring the process of adopting the directive on additional pensions to a conclusion

• Supported by the Lexicon-Grammar theoretical framework and transformational grammar (Gross, 1968; 1975)

• The alignment task of the translation pairs of units resulted in a gold collection, achievable due to the CLUE-Aligner

Alignment Guidelines

Page 11: CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

11

• Allows visualization of automatic phrase alignments and can be used for correcting inaccurate alignments– can load previously (and, possibly, automatically) generated

alignments (segments) for the parallel sentences• Allows alignment of smaller individual or MWU inside DMWU• Useful in human and machine translation evaluation• Future development plans include automatic alignment

– alignments containing pairs of paraphrastic or translation units can be used to train ML systems

• Developed under the scope of the eSPERTo project https://esperto.l2f.inesc-id.pt/esperto/aligner/index.pl?

CLUE-Aligner

Page 12: CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

12

Use of Paraphrastic Units in eSPERTo

the man who is Americanthe man from Americathe man with American nationality…

The American man

https://esperto.l2f.inesc-id.pt/esperto/esperto/demo.pl

Page 13: CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

13

• Linguistic-based alignments extracted from quality corpora:– Contribute to increased precision and recall in SMT systems, with

subsequent improvement of translation quality– Are a valuable asset for applications that require monolingual

paraphrases

• We moved forward by creating a tool that handles non-adjacent structures, allowing the alignment of DMWU and phrasal expressions to improve translation applications

• Improvements to CLUE-Aligner include:– to feed it with existing translation or paraphrastic knowledge

previously aligned or generated with a linguistic processing tool– To enhance it in order to align and extract automatically large

amounts of alignment pairs to be applied to paraphrasing and MT case studies

Conclusions and Future Work

Page 14: CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units

14

Thank you!

AcknowledgementsThis research work was supported by Fundação para a Ciência e Tecnologia (FCT), under project eSPERTo EXPL/MHC-LIN/2260/2013, UID/CEC/50021/2013, and post-doctoral grant SFRH/BPD/91446/2012