esr6 varvara logacheva - expert summer school - malaga 2015

Artificial training data for quality estimationof machine translation

Varvara Logacheva

University of Sheffield

June 27, 2015

Varvara Logacheva Artificial data for Quality Estimation 1/18

Motivation

The use of human feedback in Machine Translation

Human data is too scarce to be incorporatedinto MT directly

Can be used to train a Quality Estimationsystem

Not enough to train a good QE system


Quality Estimation

Source: Le ciel est bleu

Automatictranslation:

The sky is green

Tagging: OK OK OK BAD


Quality Estimation



The sky is green

Tagging:

OK OK OK BAD


Quality Estimation



The sky is green

Tagging: OK

OK OK BAD


Quality Estimation



The sky is green

Tagging: OK OK

OK BAD


Quality Estimation



The sky is green

Tagging: OK OK OK

BAD


Quality Estimation



The sky is green

Tagging: OK OK OK BAD


Injection of errors into well-formed sentences

Take a well-formed sentence

Decide which words should be:

DeletedShiftedSubstituted with other words

Perform changes:

Delete wordsShift words to other positionsReplace words with othersInsert new words

Used before by Raybaud et al. (2011), but words were chosenrandomly



Take a well-formed sentenceDecide which words should be:


Perform changes:

Delete wordsShift words to other positionsReplace words with othersInsert new words




Take a well-formed sentenceDecide which words should be:


Perform changes:Delete wordsShift words to other positionsReplace words with othersInsert new words



Error GenerationFirst stage

Input: (well-formed) sentence

w = w1 w2 w3 ... wnOutput: Error tag for each word of the sentence

C = C1 C2 C3 ... Cn

bigramEG: bigram error model: P(Ci |Ci−1) (Raybaud et al.,2011).

wordprobEG: probability of an error given a word: P(Ci |wi )

crfEG: P(C1, ...,Cn|w1, ...,wn)


Words InsertionSecond stage

Input: well-formed sentence with a sequence of error tags

w = w1 w2 w3 ... wn

C = C1 C2 C3 ... CnOutput: sentence with errors

w′ = w ′1 w ′

2 w ′3 ... w ′

m


Words InsertionSecond stage

Deletion is trivialShift requires distribution of shift distancesWe need to choose new words to insert and substitute:

unigramWI: P(wi )

paraphraseWI: P(w ′i |wi ), where (si ,wi ) and (si ,w′i ) are

entries of a translation table

lexprobWI: P(wi |si ), where si is a source word


ParaphraseWI

target source

target source

target source

target source

target source

target source

target source

target source

target source

⇒⇒⇒⇒

source targetsource targetsource target


source targetsource target

source target

⇒

Paraphraselist

⇒

⇒⇒

⇒

⇒


LexprobWI

source target




source target Translationslist

⇒

⇒⇒

⇒

⇒


Experiments

Three Quality Estimation systems trained on theartificial data:

Sentence-level:“Good” / “Almost good” / “Bad”

Sentence-level: HTER score prediction

Word-level: “Good” / “Bad”


Experiments: Sentence classificationGood / Almost good / Bad


Experiments: HTER prediction


Experiments: word-level binary classification


Conclusions

9 new methods of artificial data generation:

bad examples are based on well-formed sentenceseach example has a corresponding source sentencethe number of errors can be varied

Artificial data improves the result of sentence-level QE:

Error Generators: CRF-based EG works betterWord Inserters: random word selection works better

Artificial data doesn’t improve word-level QE


Thank you


Experiments: generated data properties

Percentage of errors in generated sentences:

bigramEG — 23%

wordprobEG — 17%

crfEG — 5%

Word inserters Unigram Paraphrase

Error generatorsBigram 699.9 888.64Wordprob 538.84 673.61CRF 165.36 172.97

Table: Perplexities of the artificial datasets w.r.t. Europarl


esr6 varvara logacheva - expert summer school - malaga 2015

Data & Analytics