to structure baseball live games as well as to improve the speech recognition accuracy

1
To structure baseball live games as well as to improve the speech recognition accuracy Using baseball dependent knowledge Models sentence phoneme signal Acoustic Model Language Model (Bi-gram) situation Situation Dependent Acoustic Model Situation Prediction Model Situation Dependent Language Model Conventional Method Proposed Method Formalization 3B 1S 3B 2S Pitch 3B 2S And Next Next Batter Batter Strikeout! 3B 3B 2S 2S Foul ball 0B 0S O W S Estimate word and situation concurrently Following simplification A situation depends only on a previous situation and a word co-occurrence. A word depends only on a present situation and a previous word. O : Sequence of observed feature vectors W : Sequence of words S : Sequence of situations Situation Dependent Acoustic Model Situation Prediction Model Situation Dependent Bi-gram Formalization : Sequence of observed feature vectors : Sequence of words Acoustic Model Language Model (Bi-gram) Problems An example of recognition error Situation Dependent Language Model Learn from training data 1B 1S 1B 2S Acoustic Model 2 Models such as normal emotion and excited emotion Adaptation by MLLR+MAP Strikeout! Strikeout! P=High Situation Prediction Model 1B 1S 2B 1S 1B 2S 2B 2S Pitch Strike Straight Ball Pitch Strike Experimental Conditions Experimental Results Work it well under ambiguous situations. More detail description of a situation including events We proposed Situation Based Speech Recognition. Counts was used as a situation. It worked well under obvious situations. 2.3% improvement of keyword accuracy. 6.1% improvement of structuring correct rate. 75.0% correct rate of exciting scene detection. An example of recognition result Log likelihood Four ball Foul ball Pitch and Strikeout! Pitch and Strikeout! 3B 2S Next Batter 3B 2S 0B 0S Strikeout! 3B 1S Time Research Purpose Problems of Conventional Method Correct foul ball, and strikeout in next pitch Mistake four ball (base on balls), and strikeout in next pitch Abstract It is a difficult problem to recognize baseball live speech because the speech is rather fast, noisy, emotional and disfluent due to rephrasing, repetition, mistake and grammatical deviation caused by spontaneous speaking style. To solve these problems, we have been studied the speech recognition method incorporating the baseball game task-dependent knowledge as well as an announcer’s emotion in commentary speech. In addition, in this paper, we propose the situation prediction model based on word co-occurrence. Owing to these proposed models, speech recognition errors are effectively prevented. This method is formalized in the framework of probability theory and implemented in the conventional speech decoding (Viterbi) algorithm. The experimental results showed that the proposed approach improved the structuring and segmentation accuracy as well as keywords accuracy. P=Low Using word co-occurrence (not BOW) Learning Stochastic Models Proposed Method ) ( ) , ( ) , | ( max arg ) | , ( max arg ) ˆ , ˆ ( ) , ( ) , ( O P S W P S W O P O S W P W S W S W S i i i i i i i W S s w w P w s s P S W O P W S ) | ( ) | ( ) , | ( max arg ) ˆ , ˆ ( 1 1 1 1 1 1 1 ) , ( i i i i M i i i i W S s w w P w w s s P S W O P W S ) | ( ) , , | ( ) , | ( max arg ) ˆ , ˆ ( 1 1 1 ) , ( Situation Based Speech Recognition for Structuring Baseball Live Games Atsushi SAKO, Tetsuya TAKIGUCHI and Yasuo ARIKI Department of Computer and Systems Engineering, Kobe University Experiments Prospect Conclusion Future Work ) ( ) ( ) | ( max arg ) | ( max arg ˆ O P W P W O P O W P W W W i i i W w w P W O P W ) | ( ) | ( max arg ˆ 1 Conventional Proposed Keyword Acc. 66.8% 69.1% Structuring Cor. 67.2% 73.3% Exciting scene Cor. - 75.0% Test set: A commentary speech on radio (7 th Sep. 2003) Learning corpus HMM: 200 hours (baseline) + 3 hours (adaptation) Language model: 570K morphemes ) , , | ( 1 1 M i i i i w w s s P ) , | ( S W O P ) , | ( 1 i i i s w w P Correct foul ball, and strikeout in next pitch Conventional four ball, and strikeout in next pitch Proposed foul ball, and strikeout in next pitch

Upload: miya

Post on 15-Jan-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Future Work. Research Purpose. Abstract. Proposed Method. Learning Stochastic Models. Prospect. Conclusion. Models. Experiments. Problems of Conventional Method. Situation Based Speech Recognition for Structuring Baseball Live Games. Atsushi SAKO, Tetsuya TAKIGUCHI and Yasuo ARIKI - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: To structure baseball live games as well as to improve     the speech recognition accuracy

To structure baseball live games as well as to improve the speech recognition accuracy Using baseball dependent knowledge

Models

sentence phoneme signal

AcousticModel

Language Model(Bi-gram)

situation

SituationDependent

AcousticModel

SituationPrediction

Model

SituationDependentLanguage

Model

Conventional Method

Proposed Method

Formalization

3B1S

3B2S

Pitch

3B2S

And

NextNextBatterBatter

Strikeout!

3B3B2S2S

Foul ball

0B0S

OWS

Estimate word and situation concurrently

Following simplification• A situation depends only on a previous situation and a word co-occurrence.• A word depends only on a present situation and a previous word.

O : Sequence of observed feature vectors W : Sequence of words S : Sequence of situations

SituationDependent

Acoustic Model

SituationPrediction

Model

SituationDependent

Bi-gram

Formalization

O : Sequence of observed feature vectors W : Sequence of words

AcousticModel

Language Model(Bi-gram)

Problems

An example of recognition error

Situation Dependent Language Model

Learn from training data

1B1S

1B2S

Acoustic Model

2 Models such as normal emotion and excited emotion Adaptation by MLLR+MAP

Strikeout! Strikeout!

P=High

Situation Prediction Model

1B1S

2B1S

1B2S

2B2S

PitchStrike

StraightBall

PitchStrike

Experimental Conditions

Experimental Results

Work it well under ambiguous situations. More detail description of a situation including events

We proposed Situation Based Speech Recognition. Counts was used as a situation. It worked well under obvious situations.

2.3% improvement of keyword accuracy. 6.1% improvement of structuring correct rate. 75.0% correct rate of exciting scene detection.

An example of recognition result

Log likelihood

Four ball

Foul ball…

…Pitch and

Strikeout!

Pitch andStrikeout!

3B 2S

Next Batter

3B 2S

0B 0S Strikeout!

3B 1S

Time

Research Purpose

Problems of Conventional Method

Correct … foul ball, and strikeout in next pitch

Mistake … four ball (base on balls), and strikeout in next pitch

Abstract

It is a difficult problem to recognize baseball live speech because the speech is rather fast, noisy, emotional and disfluent due to rephrasing, repetition, mistake and grammatical deviation caused by spontaneous speaking style. To solve these problems, we have been studied the speech recognition method incorporating the baseball game task-dependent knowledge as well as an announcer’s emotion in commentary speech. In addition, in this paper, we propose the situation prediction model based on word co-occurrence. Owing to these proposed models, speech recognition errors are effectively prevented. This method is formalized in the framework of probability theory and implemented in the conventional speech decoding (Viterbi) algorithm. The experimental results showed that the proposed approach improved the structuring and segmentation accuracy as well as keywords accuracy.

P=Low

Using word co-occurrence (not BOW)

Learning Stochastic Models

Proposed Method

)(

),(),|(maxarg)|,(maxarg)ˆ,ˆ(

),(),( OP

SWPSWOPOSWPWS

WSWS

i

iii

iii

WSswwPwssPSWOPWS )|()|(),|(maxarg)ˆ,ˆ( 1

11

11

11

),(

i

iiiMiiiiWS

swwPwwssPSWOPWS )|(),,|(),|(maxarg)ˆ,ˆ( 111),(

Situation Based Speech Recognition for Structuring Baseball Live GamesAtsushi SAKO, Tetsuya TAKIGUCHI and Yasuo ARIKI

Department of Computer and Systems Engineering, Kobe University

Experiments

Prospect

Conclusion

Future Work

)(

)()|(maxarg)|(maxargˆ

OP

WPWOPOWPW

WW

i

iiW

wwPWOPW )|()|(maxargˆ1

Conventional Proposed

Keyword Acc. 66.8% 69.1%

Structuring Cor. 67.2% 73.3%

Exciting scene Cor. - 75.0%

Test set: A commentary speech on radio (7th Sep. 2003) Learning corpus

HMM: 200 hours (baseline) + 3 hours (adaptation) Language model: 570K morphemes

),,|( 11 Miiii wwssP

),|( SWOP

),|( 1 iii swwP

Correct … foul ball, and strikeout in next pitch

Conventional … four ball, and strikeout in next pitch

Proposed … foul ball, and strikeout in next pitch