boosting training scheme for acoustic modeling rong zhang and alexander i. rudnicky language...
TRANSCRIPT
![Page 1: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/1.jpg)
Boosting Training Scheme for Acoustic Modeling
Rong Zhang and Alexander I. RudnickyLanguage Technologies Institute,
School of Computer Science
Carnegie Mellon University
![Page 2: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/2.jpg)
Reference Papers
• [ICASSP 03] Improving The Performance of An LVCSR System Through Ensembles of Acoustic Models
• [EUROSPEECH 03] Comparative Study of Boosting and Non-Boosting Training for Constructing Ensembles of Acoustic Models
• [ICSLP 04] A Frame Level Boosting Training Scheme for Acoustic Modeling
• [ICSLP 04] Optimizing Boosting with Discriminative Criteria
• [ICSLP 04] Apply N-Best List Re-Ranking to Acoustic Model Combinations of Boosting Training
• [EUROSPEECH 05] Investigations on Ensemble Based Semi-Supervised Acoustic Model Training
• [ICSLP 06] Investigations of Issues for Using Multiple Acoustic Models to Improve Continuous Speech Recognition
![Page 3: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/3.jpg)
Improving The Performance of An LVCSR System Through Ensembles of Acoustic Models
ICASSP 2003
Rong Zhang and Alexander I. RudnickyLanguage Technologies Institute,
School of Computer Science
Carnegie Mellon University
![Page 4: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/4.jpg)
Introduction
• An ensemble of classifiers is a collection of single classifiers – which is used to select a hypothesis based on the majority vote
from its components
• Bagging and Boosting are the two most successful algorithms for construction ensembles
• This paper describes the work on applying ensembles of acoustic models to the problem of LVCSR
plurality voting
majority voting
weighted voting
![Page 5: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/5.jpg)
Bagging vs. Boosting
• Bagging– In each round, bagging randomly selects a number of examples
from the original training set, and produces a new single classifier based on the selected subset
– The final classifier is built by choosing the hypothesis best agreed on by single classifiers
• Boosting– In boosting, the single classifiers are iteratively trained in a
fashion such that hard-to-classify examples are given increasing emphasis
– A parameter that measures the classifier’s importance is determined in respect of its classification accuracy
– The final hypothesis is the weighted majority vote from the single classifiers
![Page 6: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/6.jpg)
Algorithms
• The first algorithm is based on the intuition that an incorrectly recognized utterance should receive more attention in training
• If the weight of an utterance is 2.6, we first add two copies of the utterance to the new training set, and then add its third copy with probability 0.6
![Page 7: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/7.jpg)
Algorithms
• The exponential increase in the size of training set is a severe problem for algorithm 1
• Algorithm 2 is proposed to address this problem
![Page 8: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/8.jpg)
Algorithms
• In algorithm 1 and 2, there is no concern to measure how important a model is relative to others– Good model should play
more important role than bad one
x
x1 1
expi
T
tittecL
otherwise
ife
ti
ti
it
0
x
x
x
x
xx
1
1
1
1
1
exp
expexp
iiTT
Ti
iiTT
T
titt
ecw
ececL
![Page 9: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/9.jpg)
Experiments
• Corpus : CMU Communicator system
• Experimental results :
![Page 10: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/10.jpg)
A Frame Level Boosting Training Scheme for Acoustic Modeling
ICSLP 2004
Rong Zhang and Alexander I. RudnickyLanguage Technologies Institute,
School of Computer Science
Carnegie Mellon University
![Page 11: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/11.jpg)
Introduction
• In the current Boosting algorithm, utterance is the basic unit used for acoustic model training
• Our analysis shows that there are two notable weaknesses in this setting..– First, the objective function of current Boosting algorithm is designed
to minimize utterance error instead of word error
– Second, in the current algorithm, an utterance is treated as a unity for resample
• This paper proposes a frame level Boosting training scheme for acoustic modeling to address these two problems
![Page 12: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/12.jpg)
is the pseudo loss for frame t, which describes the
degree of confusion of this frame for recognition
Frame Level Boosting Training Scheme
• The metrics that we will use in Boosting training is the frame level conditional probability -----(word level)
• Objective function :
x|taP
NBest
ttNBest
h
ahtt
hP
hP
P
aPaP
x
x
x
x
x
xx
,
,,
| label,
N
i
T
t aaiii
i
t
aPaPL1 1
||exp xx
taa
iii aPaP xx ||exp
![Page 13: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/13.jpg)
Frame Level Boosting Training Scheme
• Training Scheme:– How to resample the frame
level training data?• to duplicate for
times and creates a new utterance for acoustic model training
ti,x ti ,x
![Page 14: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/14.jpg)
Experiments
• Corpus : CMU Communicator system
• Experimental results :
![Page 15: Boosting Training Scheme for Acoustic Modeling Rong Zhang and Alexander I. Rudnicky Language Technologies Institute, School of Computer Science Carnegie](https://reader035.vdocuments.net/reader035/viewer/2022062408/56649f0c5503460f94c1fc57/html5/thumbnails/15.jpg)
Discussions
• Some speculations on this outcome : – 1. The mismatch between training criterion and target in principle still
exists in the new scheme
– 2. The frame based resample method is exclusively depends on the weight calculated in the training process
– 3. Forced-alignment is used to determine the correct word for each frame
– 4. A new training scheme considering both utterance and frame level recognition errors may be more suitable for accurate modeling