speed or accuracy? a study in evaluation of simultaneous ... · a study in evaluation of...
TRANSCRIPT
1
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Speed or Accuracy?A Study in Evaluation of
Simultaneous Speech TranslationTakashi Mieno, Graham Neubig, Sakriani Sakti,
Tomoki Toda, Satoshi Nakamura
Nara Institute of Scienceand Technology (NAIST)
2
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Speech Translation
Source: Microsoft Researchhttp://research.microsoft.com/en-us/news/features/translator-052714.aspx
Source: NICThttp://www.nict.go.jp/press/2010/06/29-1.html
3
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Problems w/ Traditional Systems
本日は私の身近にあるとある難題について話しますが皆さんにも関連のある難題で数年前イギリスに渡った時に…
...
SystemSystem
4
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Simultaneous Speech Translation
本日は私の身近にあるとある難題について話しますが //皆さんにも関連のある難題で //数年前イギリスに渡った時に…
I want to talk today about adifficult topic that is close to me
SystemSystem
and closer than you think to you
5
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Problems with Evaluation● Given two systems of different speed and accuracy,
which is better?
Delay
Acc
ura
cy
LongShort
Hig
hLo
w
もっと 手頃な ホテルは ありませんか more cheap hotel is there もっと 手頃な ホテルは ありませんか more cheap hotel is there
Don’t split the sentence
Split the sentence
do you have a more reasonable hotel ? /
more / reasonable / is there a hotel ? /
6
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Goal of EvaluationA
ccur
acy
Delay
High
LowAcc
urac
y
Delay
Acc
urac
y
Delay
● An evaluation measure considering delay andaccuracy for simultaneous speech translation.
7
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Proposed Evaluation Method
8
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
How to Create an Evaluation Function?(Based on Data)
EvaluatedData
EvaluatedData
AccuracyAccuracy
DelayDelay
Training Data
FeaturesMovies with variousdelays and accuracies
Mov
ie d
ata
Mac
hine
Lea
rnin
g
Eva
lua
tion
Fu
nctio
n
9
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Evaluation Sheet Example
● (Separate window)
10
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Data Format• Rank-based evaluation
– Perform comparative evaluation of which output is “better”
– Allows for consideration of both speed and accuracy
System A
System B
System C
System D
System E
Output A
Output B
Output C
Output D
Output E
☆Rank
4
1
3
2
5
Inpu
t vid
eo
Ra
nkin
g b
yev
alua
tors
11
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Learning an Evaluation Function
Weight vector Features useful inevaluation(i.e., delay and accuracy)
Displayedvideo
Define a linear function that takes a video as inputand returns a score
This function can be learned from ranked datausing “learning to rank”
12
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Learning to Rank
TrainingData
TrainingData
Mov. 1Mov. 1
0.05 0 30.65 10 20.30 3 1
MovieMovie
Accuracy Delay Rank
Mov. 2Mov. 2
0.70 3 10.50 10 30.35 5 2
Mov. 3Mov. 3
0.65 2 10.45 7 20.05 3 3
For all pairs of ratings for each movie, learn the order w/ an SVM
-1-1
+1
13
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Experiments
14
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Experimental Setup
• Target video
TED TalksTED Talks
• Gathered dataVideo 20 Types 20-30 Seconds
Delay 7 Types 0,1,2,3,5,7,10Seconds
Subjects 15 Native speakers
Method Ranking 1-3
Modalities Speech + Subtitles
• Translation data(5 varieties)English → Japanese
① Realtime trans. isimportant
② Often used in MTevaluation TranslatorTranslator
Interpreter 1(S Rank)
Interpreter 1(S Rank)
Interpreter 2(A Rank)
Interpreter 2(A Rank)
Syntax-based MTSyntax-based MT
Phrase-based MTPhrase-based MT
15
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Training/Evaluation
Training data
Ranking datafor 19 movies
Training data
Ranking datafor 19 movies
Linear SVM
Acc. Eval
BLEU+1 (Auto)RIBES (Auto)Adequacy (Man.)
Delay
7 Varieties
Test Data
Ranking dataof a held-out
movie
Test Data
Ranking dataof a held-out
movie
Correct rankingpercentage
Chance Rate = 0.5
(2-fold cross-validation)
Data Format
Features
Eval. Accuracy
Training
Eval
16
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Evaluation of Evaluation
Acc. Delay+Acc.0
0.10.20.30.40.50.60.70.80.9
1
Text Subtitles
Acc
ura
cy
Acc. Delay+Acc.0
0.10.20.30.40.50.60.70.80.9
1
Speech
NoneBLEU+1RIBESAdeq.
17
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Q1: Is Delay Important in S2STranslation?
Acc. Delay+Acc.0
0.10.20.30.40.50.60.70.80.9
1
Text Subtitles
Acc
ura
cy
Acc. Delay+Acc.0
0.10.20.30.40.50.60.70.80.9
1
Speech
NoneBLEU+1RIBESAdeq.
A: Yes! In all cases, the scoring function considering delaydid as good or better than just considering accuracy.
18
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Q2: Does Importance Depend onModality of Presentation?
Acc. Delay+Acc.0
0.10.20.30.40.50.60.70.80.9
1
Text Subtitles
Acc
ura
cy
Acc. Delay+Acc.0
0.10.20.30.40.50.60.70.80.9
1
Speech
NoneBLEU+1RIBESAdeq.
A: Yes! Considering delay was more useful when presenting results through subtitles.Why?: Probably because when watching subtitles, itis possible to hear the original speech.
Avg. +7% Avg. +3%
19
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Q3: Does this Solve Evaluation?
Acc. Delay+Acc.0
0.10.20.30.40.50.60.70.80.9
1
Text Subtitles
Acc
ura
cy
Acc. Delay+Acc.0
0.10.20.30.40.50.60.70.80.9
1
Speech
NoneBLEU+1RIBESAdeq.
A: No! We still have a large gap between fully automaticeval and human annotation, and ranking accuracy isstill not high.
20
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Learned Evaluation Functions(for Adequacy)
Speech OutputSubtitle Output5
4
3
2
10 2 4 6 8 10
Delay (s)
5
4
3
2
10 2 4 6 8 10
Delay (s)
5 Le
vel A
ccep
tabi
lity
5 Le
vel A
ccep
tabi
lity
Accuracy Delay
Subtitle Output 1.40 -0.059
Speech Output 1.99 -0.018
1 point of adequacy =
8.0 sec. of delay
28.5 sec. of delay
21
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Future Challenges
22
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Future Challenges
● Current conception of delay is artificial
● Can we generalize in some way?
● Non-linear evaluation functions
23
Speed or Accuracy? A Study in Evaluation of Simultaneous Speech Translation
Thank You!