spider: a system for paraphrasing - applicability in machine translation pre-editing - anabela...

22
SPIDER: A SYSTEM FOR P ARAPHRASING IN DOCUMENT EDITING AND REVISION APPLICABILITY IN MACHINE TRANSLATION PRE-EDITING Anabela Barreiro [email protected] CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Upload: anabela-barreiro

Post on 07-Dec-2014

928 views

Category:

Technology


0 download

DESCRIPTION

SPIDER is a system for paraphrasing in document editing and revision. It was designed to help with writing optimization, but its applicability extends to MT pre-editing.

TRANSCRIPT

Page 1: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

SPIDER: A SYSTEM FOR PARAPHRASING

IN DOCUMENT EDITING AND REVISION

APPLICABILITY IN MACHINE TRANSLATION PRE-EDITING

Anabela Barreiro

[email protected]

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 2: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

OUTLINE

INTRODUCTION

PARAPHRASES IN NLP

PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS

SPIDER

FIRST STEPS

IMPORTANT FEATURES

PARAPHRASES COVERED BY SPIDER

INTERFACE

LINGUISTIC RESOURCES

EVALUATION RESULTS

THE FUTURE

FUTURE APPLICATIONS?

FUTURE RESEARCH

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 3: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

Question Answering[Ibrahim et al., 2003], [Paşca, 2003], [Duboué & Chu-Carroll, 2006]

Information Extraction and Text Mining [Ibrahim et al., 2003], [Shinyama et al., 2002] [Shinyama & Sekine, 2003], [Sekine, 2005] [Paşca, 2005], [Paşca & Dienes, 2005]

Summarization [McKeown et al., 2002], [Barzilay, 2001, 2003], [Hirao et al., 2004] [Zhou et al., 2006b]

Natural Language Generation[Iordanskaja et al. 1991]

Plagiarism Detection [Potthast et al., 2010], [Vila et al., 2010]

Machine Translation [Zhou et al., 2006], [Callison-Burch et al., 2006a, 2006b, 2007 and 2008] [Barreiro, 2008, 2009, 2011]

IMPORTANCE OF PARAPHRASES IN NLP TASKS

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 4: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

THE PRACTICAL NEED FOR PARAPHRASES

IN PEDAGOGICAL CONTEXTS

Text Processing and Authoring Aids

Writing and revision of original/creative/customized texts

Learning Tools

Native and second language learning

Creation of clear and understandable text content

e.g. students learning language and writing skills

Style Editors

Uniformization /consistency of style

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 5: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

THE PRACTICAL NEED FOR PARAPHRASES

IN PROFESSIONAL CONTEXTS

Technical Writing

Professional high quality documentation and domain-specific texts

Controlled language

Linguistic Quality Assurance

Linguistic quality of generic texts and specialized documentation

Verification/validation of meaningful content

Text Optimization

Readable / publishable texts (business-oriented or purpose-oriented content)

Terminology

Search for the “exact” term or relevant keywords

Translation

Indispensable for human and machine translation (pre-editing and post-editing)

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 6: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

OUTLINE

INTRODUCTION

PARAPHRASES IN NLP

PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS

SPIDER

FIRST STEPS

IMPORTANT FEATURES

PARAPHRASES COVERED BY SPIDER

INTERFACE

LINGUISTIC RESOURCES

EVALUATION RESULTS

THE FUTURE

FUTURE APPLICATIONS?

FUTURE RESEARCH

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 7: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

SPIDER PARAPHRASING SYSTEM

FIRST STEPS

Initially developed for Portuguese 1st version – ReEscrevepublicly available service at http://www.linguateca.pt/ReEscreve/

2nd version – eSPERTo (Portuguese: the smart/clever one; expert)currently being integrated in a cyber school project within the scope of an educational program

Writing exercises – students learning how to improve their writing skills in the Portuguese language

English SPIDERprototype to assist writing of domain-specific texts

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 8: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

SPIDERIMPORTANT FEATURES

Applies linguistic knowledge to recognize and generate paraphrasesautomatically (preserves the source text semantics and grammaticality -inflectional features) in the suggestions provided (included transformations ofmulti-word units)

Uses text-editing mechanisms which provide a variety of alternatives foreach expression and the possibility to choose among them (according topersonal preferences, style, idiomacity, etc.)

Allows users to suggest new expressions that can be immediately appliedto their text, making the text editing process easier, more flexible, andupgradable

Designed to help with writing optimization, understandability andtranslatability (improvement of the quality of the source text so that it can causea positive impact in translation)

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 9: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

PARAPHRASES COVERED BY SPIDER

Synonyms in context (ex: phrasal verbs into equivalent expressions)to clear up (weather) = (weather) to become better/brighter

Support verb constructions into single verbs and stylistic variantsto make a decision = to decide; to make an audit = to perform an audit

Aspectual constructions into single verbsto launch an attack = to attack

Adverbials (compounds into single adverbs)in a constructive way = constructively

Relatives into participial adjectivesthe president that was elected = the president elect

Relatives into possessivesthe role that Europe plays/has = the role of Europe

Relatives into compound nouns (and vice-versa)a container for the milk = a milk container; a bottle made of plastic = a plastic bottle

Agentive passives into activesthe man was released by the police officer = the police officer released the man

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 10: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

INTERFACE

SUGGESTIONS FOR EXAMPLE SENTENCES

Suggestions for general languagelinguistic phenomena

Compound adverbs > single adverbs

Support verb constructions > single verbs

Relatives > participial adjectives

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 11: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

INTERFACE

SELECTION OF PARAPHRASING GRAMMARS FOR SPECIFIC

LINGUISTIC PHENOMENA

Users can select among general and technical dictionaries (more than one selection allowed), grammars for specific linguistic transformations (one, several or all grammars can be selected). The interface provides sample texts for testing.

Sample LEGAL text

Informative details about the linguistic resources selected

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 12: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

Identification of legal terms in the text

Suggestions for the term “breach of law”

Users can select one term from the list of suggestions or provide a new suggestion

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

INTERFACE

SELECTION OF A DOMAIN DICTIONARY

Page 13: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

Text rewritten• In red, the expressions in the source text

• In green, suggestions provided by SPIDER and selected by the user

The user can suggest new words orexpressions (synonyms or paraphrases)

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

INTERFACE

SUGGESTIONS PROVIDED AND USER’S CAPABILITY TO ADD NEW REWRITING

OPTIONS

It is possible to go back and change the user option as many times as necessary

Page 14: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

LINGUISTIC RESOURCES

Eng4NooJ – linguistic knowledge system

• OpenLogos dictionary (http://logos-os.dfki.de/)

• converted into NooJ format, and enhanced with newproperties, including derivational and morpho-syntacticand semantic relations

• Morphological system

• Contextual rules and grammars

• Domain specific dictionary (sample “legal terms”)

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 15: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

NDRV04 = <B>ion/Npred+NomADRV02 = <B>icableAVDRV01 = <E>ly/ADVAVDRV04 = <B>tically/ADV

impress,V+FLX=POLISH+SAL=PVPCpleasetype+PT=impressionar+DRV=NDRV01:BOOK+VSUP=make+VSUP=cause+NPREP=onaesthetic,AFLX=NATURAL+SAL=AVstate+PT=aesthetically+DRV=AVDRV03skepticism,N+FLX=BOOK+SAL=ABcause+PT=cepticismo+DRV=NAVDRV02

Grammar to recognize adverbial compounds and

transform them into equivalent single adverbs

Rules to transform

morpho-syntactically

and semantically

related words of

different parts of

speech

General language dictionary entries

LINGUISTIC RESOURCES

Morpho-syntactic

and semantic

relations

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Rules to improve precision in specific contexts [bring(vt)) N(charge; action) > present(vt) N(idem)]

Contextual rules

Page 16: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

Sample of terms classified as Information +

Instructional/legal

LINGUISTIC RESOURCES

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 17: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

EVALUATION RESULTS: PARAPHRASING

PRECISION

SVC Recognition

Precision

SVC Recognition

Recall

SVC Paraphrasing

Precision

Pôr 73/73 - 100% 73/100 – 73% 72/73 - 98.6%

Tomar 75/75 - 100% 75/100 – 75% 68/73 - 93.1%

Ter 65/65 - 100% 65/100 – 65% 59/65 - 90.7%

Dar 57/60 - 95% 57/100 – 57% 46/51 - 90.1%

Fazer 43/45 – 95.5% 43/100 – 43% 40/45 - 88.8%

Average 62.6/63.6 - 98.4% 62.6/100 - 62.6% 57/61 - 93.4%

Evaluation of recognition and paraphrasing of support verb constructions

Corpus: 500 sentences

100 sentences for each of 5 elementary support verbs

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 18: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

EVALUATION RESULTS: IMPACT ON

TRANSLATABILITY (MT)Same corpus, 50 sentences selected randomly

(i) automated pre-processing of support verb constructions with SPIDER and conversion into equivalent single verbs

(ii) pre-processed sentences (automatically generated paraphrases) and original text are submitted to MT and the output translations for both original and pre-processed sentences were compared

• 29 (58%) of the best translations were of automatically generated paraphrases• 9 (18%) were of support verb constructions • 12 (24%) were equally bad or equally good

CONCLUSIONThe experiment indicates that paraphrases such as those generated by SPIDER help

improve translation scores

• The automated paraphrasing of support verb constructions through SPIDER

allowed a significant improvement of the quality of the MT results in that context

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 19: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

OUTLINE

INTRODUCTION

PARAPHRASES IN NLP

PARAPHRASES IN PEDAGOGICAL AND PROFESSIONAL CONTEXTS

SPIDER

FIRST STEPS

IMPORTANT FEATURES

PARAPHRASES COVERED BY SPIDER

INTERFACE

LINGUISTIC RESOURCES

EVALUATION RESULTS

THE FUTURE

FUTURE APPLICATIONS?

FUTURE RESEARCH

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 20: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

FUTURE APPLICATIONS?• Writing / authoring aid (word processing applications)

• Language composition tool - general and technical language (e.g. student texts or legaltexts)

• Text production and style editor

• Terminology verification tool - professional use of terminology in technical domains(elimination of informal, idiomatic, slang use of language)

• Empirical testbed for linguistic quality assurance (source and target texts)

• Text editing (machine translation pre-editing and post-editing) and translation aid

• Controlled language tool

• Consistent, direct, and simple language• Restricted grammar (avoid certain types of construction)• Avoid complex reasoning, figures of speech, metaphors, etc.• Elimination of wordiness

• “Revision memory” tool (≈ “translation memory”) - recycling of validated reviewedsentences, structures or phrases

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 21: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

$EN

FUTURE RESEARCHFROM SPIDER TO MACHINE TRANSLATION

a fazer um estágio para dar aulas de / tutor Religião

a fazer um estágio para dar aulas de / lecture Religião

a fazer um estágio para dar aulas de / teach Religião

começa a dar exemplos / exemplify :

sentia-se capaz de dar um murro em / punch quem quisesse detê-lo

gostávamos de lhe dar uma palavrinha / speak .

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan

Page 22: SPIDER: a System for Paraphrasing - Applicability in Machine Translation Pre-Editing - Anabela Barreiro

SPIDER: A SYSTEM FOR PARAPHRASING

IN DOCUMENT EDITING AND REVISION

APPLICABILITY IN MACHINE TRANSLATION PRE-EDITING

Anabela Barreiro

[email protected]

CICLing 2011 February 20-26, 2011 Anabela Barreiro Tokyo, Japan