esr6 varvara logacheva - expert summer school - malaga 2015
TRANSCRIPT
Artificial training data for quality estimationof machine translation
Varvara Logacheva
University of Sheffield
June 27, 2015
Varvara Logacheva Artificial data for Quality Estimation 1/18
Motivation
The use of human feedback in Machine Translation
Human data is too scarce to be incorporatedinto MT directly
Can be used to train a Quality Estimationsystem
Not enough to train a good QE system
Varvara Logacheva Artificial data for Quality Estimation 2/18
Motivation
The use of human feedback in Machine Translation
Human data is too scarce to be incorporatedinto MT directly
Can be used to train a Quality Estimationsystem
Not enough to train a good QE system
Varvara Logacheva Artificial data for Quality Estimation 2/18
Motivation
The use of human feedback in Machine Translation
Human data is too scarce to be incorporatedinto MT directly
Can be used to train a Quality Estimationsystem
Not enough to train a good QE system
Varvara Logacheva Artificial data for Quality Estimation 2/18
Motivation
The use of human feedback in Machine Translation
Human data is too scarce to be incorporatedinto MT directly
Can be used to train a Quality Estimationsystem
Not enough to train a good QE system
Varvara Logacheva Artificial data for Quality Estimation 2/18
Motivation
The use of human feedback in Machine Translation
Human data is too scarce to be incorporatedinto MT directly
Can be used to train a Quality Estimationsystem
Not enough to train a good QE system
Varvara Logacheva Artificial data for Quality Estimation 2/18
Quality Estimation
Source: Le ciel est bleu
Automatictranslation:
The sky is green
Tagging: OK OK OK BAD
Varvara Logacheva Artificial data for Quality Estimation 3/18
Quality Estimation
Source: Le ciel est bleu
Automatictranslation:
The sky is green
Tagging:
OK OK OK BAD
Varvara Logacheva Artificial data for Quality Estimation 3/18
Quality Estimation
Source: Le ciel est bleu
Automatictranslation:
The sky is green
Tagging: OK
OK OK BAD
Varvara Logacheva Artificial data for Quality Estimation 3/18
Quality Estimation
Source: Le ciel est bleu
Automatictranslation:
The sky is green
Tagging: OK OK
OK BAD
Varvara Logacheva Artificial data for Quality Estimation 3/18
Quality Estimation
Source: Le ciel est bleu
Automatictranslation:
The sky is green
Tagging: OK OK OK
BAD
Varvara Logacheva Artificial data for Quality Estimation 3/18
Quality Estimation
Source: Le ciel est bleu
Automatictranslation:
The sky is green
Tagging: OK OK OK BAD
Varvara Logacheva Artificial data for Quality Estimation 3/18
Injection of errors into well-formed sentences
Take a well-formed sentence
Decide which words should be:
DeletedShiftedSubstituted with other words
Perform changes:
Delete wordsShift words to other positionsReplace words with othersInsert new words
Used before by Raybaud et al. (2011), but words were chosenrandomly
Varvara Logacheva Artificial data for Quality Estimation 4/18
Injection of errors into well-formed sentences
Take a well-formed sentenceDecide which words should be:
DeletedShiftedSubstituted with other words
Perform changes:
Delete wordsShift words to other positionsReplace words with othersInsert new words
Used before by Raybaud et al. (2011), but words were chosenrandomly
Varvara Logacheva Artificial data for Quality Estimation 4/18
Injection of errors into well-formed sentences
Take a well-formed sentenceDecide which words should be:
DeletedShiftedSubstituted with other words
Perform changes:Delete wordsShift words to other positionsReplace words with othersInsert new words
Used before by Raybaud et al. (2011), but words were chosenrandomly
Varvara Logacheva Artificial data for Quality Estimation 4/18
Injection of errors into well-formed sentences
Take a well-formed sentenceDecide which words should be:
DeletedShiftedSubstituted with other words
Perform changes:Delete wordsShift words to other positionsReplace words with othersInsert new words
Used before by Raybaud et al. (2011), but words were chosenrandomly
Varvara Logacheva Artificial data for Quality Estimation 4/18
Error GenerationFirst stage
Input: (well-formed) sentence
w = w1 w2 w3 ... wnOutput: Error tag for each word of the sentence
C = C1 C2 C3 ... Cn
bigramEG: bigram error model: P(Ci |Ci−1) (Raybaud et al.,2011).
wordprobEG: probability of an error given a word: P(Ci |wi )
crfEG: P(C1, ...,Cn|w1, ...,wn)
Varvara Logacheva Artificial data for Quality Estimation 5/18
Words InsertionSecond stage
Input: well-formed sentence with a sequence of error tags
w = w1 w2 w3 ... wn
C = C1 C2 C3 ... CnOutput: sentence with errors
w′ = w ′1 w ′
2 w ′3 ... w ′
m
Varvara Logacheva Artificial data for Quality Estimation 6/18
Words InsertionSecond stage
Deletion is trivialShift requires distribution of shift distancesWe need to choose new words to insert and substitute:
unigramWI: P(wi )
paraphraseWI: P(w ′i |wi ), where (si ,wi ) and (si ,w′i ) are
entries of a translation table
lexprobWI: P(wi |si ), where si is a source word
Varvara Logacheva Artificial data for Quality Estimation 7/18
ParaphraseWI
target source
target source
target source
target source
target source
target source
target source
target source
target source
⇒⇒⇒⇒
source targetsource targetsource target
source targetsource targetsource target
source targetsource target
source target
⇒
Paraphraselist
⇒
⇒⇒
⇒
⇒
Varvara Logacheva Artificial data for Quality Estimation 8/18
LexprobWI
source target
source targetsource target
source targetsource targetsource target
source targetsource target
source target Translationslist
⇒
⇒⇒
⇒
⇒
Varvara Logacheva Artificial data for Quality Estimation 9/18
Experiments
Three Quality Estimation systems trained on theartificial data:
Sentence-level:“Good” / “Almost good” / “Bad”
Sentence-level: HTER score prediction
Word-level: “Good” / “Bad”
Varvara Logacheva Artificial data for Quality Estimation 10/18
Experiments: Sentence classificationGood / Almost good / Bad
Varvara Logacheva Artificial data for Quality Estimation 11/18
Experiments: Sentence classificationGood / Almost good / Bad
Varvara Logacheva Artificial data for Quality Estimation 12/18
Experiments: Sentence classificationGood / Almost good / Bad
Varvara Logacheva Artificial data for Quality Estimation 13/18
Experiments: HTER prediction
Varvara Logacheva Artificial data for Quality Estimation 14/18
Experiments: word-level binary classification
Varvara Logacheva Artificial data for Quality Estimation 15/18
Conclusions
9 new methods of artificial data generation:
bad examples are based on well-formed sentenceseach example has a corresponding source sentencethe number of errors can be varied
Artificial data improves the result of sentence-level QE:
Error Generators: CRF-based EG works betterWord Inserters: random word selection works better
Artificial data doesn’t improve word-level QE
Varvara Logacheva Artificial data for Quality Estimation 16/18
Thank you
Varvara Logacheva Artificial data for Quality Estimation 17/18
Experiments: generated data properties
Percentage of errors in generated sentences:
bigramEG — 23%
wordprobEG — 17%
crfEG — 5%
Word inserters Unigram Paraphrase
Error generatorsBigram 699.9 888.64Wordprob 538.84 673.61CRF 165.36 172.97
Table: Perplexities of the artificial datasets w.r.t. Europarl
Varvara Logacheva Artificial data for Quality Estimation 18/18