pattern recognition in speech and language processing

385
THE ELECTRICAL ENGINEERING AND APPLIED SIGNAL PROCESSING SERIES Edited by Alexander Poularikas The Advanced Signal Processing Handbook: Theory and Implementation for Radar, Sonar, and Medical Imaging Real-Time Systems Stergios Stergiopoulos The Transform and Data Compression Handbook K.R. Rao and P.C. Yip Handbook of Multisensor Data Fusion David Hall and James Llinas Handbook of Neural Network Signal Processing Yu Hen Hu and Jenq-Neng Hwang Handbook of Antennas in Wireless Communications Lal Chand Godara Noise Reduction in Speech Applications Gillian M. Davis Signal Processing Noise Vyacheslav P. Tuzlukov Digital Signal Processing with Examples in MATLAB ® Samuel Stearns Applications in Time-Frequency Signal Processing Antonia Papandreou-Suppappola The Digital Color Imaging Handbook Gaurav Sharma Pattern Recognition in Speech and Language Processing Wu Chou and Biing Huang Juang Forthcoming Titles Propagation Data Handbook for Wireless Communication System Design Robert Crane Smart Antennas Lal Chand Godara Nonlinear Signal and Image Processing: Theory, Methods, and Applications Kenneth Barner and Gonzalo R. Arce

Upload: nguyennga

Post on 08-Dec-2016

236 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Pattern Recognition in Speech and Language Processing

THE ELECTRICAL ENGINEERINGAND APPLIED SIGNAL PROCESSING SERIES

Edited by Alexander Poularikas

The Advanced Signal Processing Handbook:Theory and Implementation for Radar, Sonar,

and Medical Imaging Real-Time SystemsStergios Stergiopoulos

The Transform and Data Compression HandbookK.R. Rao and P.C. Yip

Handbook of Multisensor Data FusionDavid Hall and James Llinas

Handbook of Neural Network Signal ProcessingYu Hen Hu and Jenq-Neng Hwang

Handbook of Antennas in Wireless CommunicationsLal Chand Godara

Noise Reduction in Speech ApplicationsGillian M. Davis

Signal Processing NoiseVyacheslav P. Tuzlukov

Digital Signal Processing with Examples in MATLAB®

Samuel Stearns

Applications in Time-Frequency Signal ProcessingAntonia Papandreou-Suppappola

The Digital Color Imaging HandbookGaurav Sharma

Pattern Recognition in Speech and Language ProcessingWu Chou and Biing Huang Juang

Forthcoming Titles

Propagation Data Handbook for Wireless Communication System DesignRobert Crane

Smart AntennasLal Chand Godara

Nonlinear Signal and Image Processing: Theory, Methods, and ApplicationsKenneth Barner and Gonzalo R. Arce

Page 2: Pattern Recognition in Speech and Language Processing

Forthcoming Titles (continued)

Soft Computing with MATLAB®

Ali Zilouchian

Signal and Image Processing Navigational SystemsVyacheslav P. Tuzlukov

Wireless Internet: Technologies and ApplicationsApostolis K. Salkintzis and Alexander Poularikas

Page 3: Pattern Recognition in Speech and Language Processing

CRC PR ESSBoca Raton London New York Washington, D.C.

Edited byWU CHOUAvaya Labs Research

BIING HWANG JUANGGeorgia Institute of Technology

PATTERNRECOGNITION inSPEECH andLANGUAGEPROCESSING

Page 5: Pattern Recognition in Speech and Language Processing

Preface

Page 6: Pattern Recognition in Speech and Language Processing

Basking Ridge, New JerseySeptember, 2002

Page 7: Pattern Recognition in Speech and Language Processing

Contributors

A. Abella

James Allan

T. Alonso

Jerome R. Bellegarda

William Byrne

Wu Chou

Sadaoki Furui

Jean-Luc Gauvain

Vaibhava Goel

Allen L. Gorin

Qiang Huo

Biing-Hwang Juang

Shigeru Katagiri

Lori Lamel

Qi (Peter) Li

Page 8: Pattern Recognition in Speech and Language Processing

John Makhoul

Hermann Ney

F. J. Och

G. Riccardi

Richard M. Schwartz

J. H. Wright

Page 9: Pattern Recognition in Speech and Language Processing

Contents

1 Minimum Classification Error (MCE) Approach in Pattern RecognitionWu Chou

2 Minimum Bayes-Risk Methods in Automatic Speech RecognitionVaibhava Goel� and William Byrne� � �

Page 10: Pattern Recognition in Speech and Language Processing

3 A Decision Theoretic Formulation for Robust Automatic Speech Recog-nitionQiang Huo

4 Speech Pattern Recognition using Neural NetworksShigeru Katagiri

Page 11: Pattern Recognition in Speech and Language Processing

5 Large Vocabulary Speech Recognition Based on Statistical MethodsJean-Luc Gauvain and Lori Lamel

6 Toward Spontaneous Speech Recognition and UnderstandingSadaoki Furui

Page 12: Pattern Recognition in Speech and Language Processing

7 Speaker AuthenticationQi Li� and Biing-Hwang Juang� � �

Page 13: Pattern Recognition in Speech and Language Processing

8 HMMs for Language Processing ProblemsRichard M. Schwartz and John Makhoul

9 Statistical Language Models With Embedded Latent Semantic Knowl-edgeJerome R. Bellegarda

Page 14: Pattern Recognition in Speech and Language Processing

10 Semantic Information Processing of Spoken Language – How May I

Help You?sm

A. L. Gorin, A. Abella, T. Alonso, G. Riccardi, and J. H. Wright,

11 Machine Translation Using Statistical ModelingHerman Ney, and F. J. Och

Page 15: Pattern Recognition in Speech and Language Processing

12 Modeling Topics for Detection and TrackingJames Allan

Page 16: Pattern Recognition in Speech and Language Processing

1

Minimum Classification Error (MCE)Approach in Pattern Recognition

Wu ChouAvaya Labs Research, Avaya Inc., USA

CONTENTS

1.1 Introduction

Proceedings of The IEEE �

Page 17: Pattern Recognition in Speech and Language Processing

�� �

Page 18: Pattern Recognition in Speech and Language Processing

1.2 Optimal Classifier from Bayes Decision Theory

�� � � �� �� � � � ������ � � �

Page 19: Pattern Recognition in Speech and Language Processing

�� � � � ������ �

��

��� �����

� �

��� � � � � �

���� � � �� � � ��� � �

���� � ��

��� � �� ������

���� ��� � ���

� �

������ � ���� ���

���� � �

��� � � � � �� �

� ��� �����

����� � �� � ����

��� � �� �

� ��

��� �

��� � � �

�� � �� ��� � � �� � � � � �

��� � �� ��� ���

� ��� � �� � �� � ��� � ��

���� � �� � ��� � �� � ���

� ��� � ���

Page 20: Pattern Recognition in Speech and Language Processing

�� �� ��� � � � � ���� ��� � �� � � �� �� � � � ��

�� ��� � ��

� ��� � �� � � �� � ���� ������ ��� �

� ���

� ���� � �� � ���

Page 21: Pattern Recognition in Speech and Language Processing

1.3 Discriminant Function Approach to Classifier Design

� ��� � ��

������������

���� � �� ��

���� � �� ��

���� � ���� ��

�� � ���� ��� � � � � ��� ��

���� � �� � �������� �������� � �� ����

�� � ���� ��� � �� � � ���� �� �

�� � ��� �� � � � � ���� �

��

������ � � � �� ��� ����

���� � � � � �� ��

�����

������ � ��

������ � � � �� ����

������� � � � �� ��� � �� ������������� �

������� � ���� ���

Page 22: Pattern Recognition in Speech and Language Processing

��� � � �

� ��� � ��

������� � � � �� ������ �� � � �� � � �� �������� � � � � � �� ��

� ��� � ��

� ��� � ��

1.4 Speech Recognition and Hidden Markov Modeling

��

�� � �� �

� �� � �� � �� �

� �� �� �� �� ��

Page 23: Pattern Recognition in Speech and Language Processing

� �� � � � � �� �

����

� ����� � � � � �� �

��

��

� � � � �

� � � � ����� �� � � � � ����

�� � �����

� ������ �� ��

��� � � �

�� � �����

� ������ �� ��

��

1.4.1 Hidden Markov Modeling of Speech

Page 24: Pattern Recognition in Speech and Language Processing

� � ������� � � � ��� �� �

� � ���� � ���� � � � � �

� � � �� � �� � ���� �� � � � � � � �

� �� � �� � � ������������ � � � ������� �

� ��

������� � � ��� � �� � � �� �� � � � � ��

��� � ��� ����

����

�� � �� � � �

��� ���� � �

��

�� �� � ��� � �� � � �

����

��

��� ��������������

� ���� ���������

������

� ������

� �� � ��

Page 25: Pattern Recognition in Speech and Language Processing

1.5 MCE Classifier Design Using Discriminant Functions

1.5.1 MCE Classifier Design Strategy

����� �� � � �� �� � � � ���

� ���� �

���� � � � � �����

������

Page 26: Pattern Recognition in Speech and Language Processing

����� � ����� � �� � ��

��

� �

���� ���

�� ����� ����

�����

������ ��� � ����� � �

����� � � �

� �

� � �� �� � � � � � ��� � �������� ��� ���� � �� � �

loss function

��� � �� � �������

��� �

� �� ����� ��

� � � � ����

�����

�� � �� ������

��� � ���� ���

��

��� � �� ���� � ����

Page 27: Pattern Recognition in Speech and Language Processing

����� � ��

���� �

�����

���

����� � ����� � ����� ����

�� � �� � � � ����� � �� �� ���� ����� � ��� �

���� �

�����

���

����� � ����� � ���������� � �� �� ����

����� � ����� ���

�����

���

����� � ����� � ���������� ���� ���

1.5.2 Optimization Methods

1.5.2.1 Expected Loss

���� � ������ ��� �

�����

�����

���� ���� ���

����

���� � �� � ����������� �����

�� ��������� �����

� � � � ��

Page 28: Pattern Recognition in Speech and Language Processing

Property 1 Suppose the following conditions are satisfied:

�� �

��

���

�� ���

��

���

��� �� �� � ��

�� � � � � � ��� such that for all t, the inner product

������ ��� �� ������� ��� ����������������� �� ��

where is the Hessian matrix of second order partial derivatives;

�� � � � � �����

�������� is the unique such that

� �� ������ ��������� ������ ��

Then, � given by��� � � �������� �����

will converge to � almost surely (i.e. with probability one).

�� ��

��������� �

��� ��

� � �� ������ �

1.5.2.2 Empirical Loss

� ��� � � � � ����

Page 29: Pattern Recognition in Speech and Language Processing

����� ��

��

���

��

���

����� � ������ � ��� �

���� � �����

� � � �

��

� � � �

� ��

���� �

���

�� �

1.5.3 Other Optimization Methods

� ��� ���� � ��� ���

��� �� � � �� � �

����

Page 30: Pattern Recognition in Speech and Language Processing

�� �� � �������� ������ �

� ��������� ���

����� � � ����

��� ���� � �� ��� � ����

1.5.4 HMM as a Discriminant Function

������� �� � � �������� �� � �����

�����

��������������� �����

� ������ ��

� ������ ��

� ���� � �� ���

������� ���

� ���� � �� � �� �

������� ���

�� ���� � �� �

���

������� ��

��

� �

��

Page 31: Pattern Recognition in Speech and Language Processing

� � �

segmental GPD� � ������� � � � ��� � �� � ����� ���� � � � � ����

� �

��

���� � �� � �

��� �

������� ��

�� ��������� ���

�����

�� �

���������

� � �����

������ �

�����

� � ���� ��� � � � � �� � �� �

������� ��

���� ���� �

�����

�������

����

����� � �

�����

� � � � ������

����� �

� ������ �

�����

������ � ���

���

�� ���� �

� ��� � � ��� �

����� � � � �

����� � �

�� ��� �� ���� ��� �

�����

� ����

�� ������ �� ������� �

����� �

��� � ������

����� � ������

�� ����� �� � ����� � �

������

�������

�� ������ �� ������� � ��

������

�� � ��

� �������� �� � � ��������� �������� ��

�� �������������

Page 32: Pattern Recognition in Speech and Language Processing

����� � ��

��������� ����

���

���

������������

���� ���������� �������

����� � ��

��������� � �

�����

��� � ��� ��

���� ����

����������

��������� �� ���� ���� � ����� ���

���� ������� �����

����� ����

����

��

�������

� ���������

� ��

����

����

� ��

�������

� ������������ �

� � ����

���������� �� � �

�������� �������� ��� �� �

�������

����������� �� � ������������ ������� ��

���������

����

���

���������

� ����������� ������� ������

��� � ��� ��

���� ����

���������

� �� ���� ����

���������

� ����� ���

���� ������� �

���� ��

����

����

� �� � �

������

�������

����

��� �� � �������

�������

��

� �

�� ������ ����

���

���������� �� � �� ������������ ��� �

Page 33: Pattern Recognition in Speech and Language Processing

��� ����

1.5.5 Relation between MCE and MMI

Page 34: Pattern Recognition in Speech and Language Processing

����� ���� �

����� �� � �������� ��

���������� ����

��� � �����

��� �������� � ����

� ������ ����

��

���

������������

�� � ����� � ��� ��� � �������� � ��� ��� � ��� �

��� ��� � �� � ���� ���� ��

���� � ����� �� � ��� ������

����� �� ���� � ��

���� � ��

� ���� � ��� � � �� � � � ����

����� �� � ������ ����

��

���

������� � �����

� ��� ���

� ������ ��� � ��� ���

��������� ����

��

���

������� � ������

� ���� �����

Page 35: Pattern Recognition in Speech and Language Processing

����� � ������ � �����

� � �

���� ����

����������� �

� ��� ���

�����������

�������� ��

� � ����������� ��

� � �

����� � ����� � �� � ������ � �����

���� ����

��������

������������ � � � ��������

���

��������

� ���������������� � �� � ��� � � ����

���

��������

�����

����� �� � � ���������������� � �� � ����

� �����

� � ������������� � ����

� ������������� � ����� � ���� � �����

������� � ����

���� ��

����� ����������������������

Page 36: Pattern Recognition in Speech and Language Processing

-10 -8 -6 -4 -2 0 2 4 6 8 10-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

FIGURE 1.1A plot of the value of the derivative of the sigmoid function.

���������� � � �����

� ����

���������� � ����� ����� � ��

����� � ����� � �� �

� ���� �� � ����� � ��

� ����� ���� � �

���� ��

���� ��

����� �����

�������

������ ���������

���� �������������

� � �

����� �� � �

Page 37: Pattern Recognition in Speech and Language Processing

��

��

�����

������ ��

������� �������������� � ���� � �����

����� � �����

����� �

� ���� �

��� ��

1.5.6 Discussions and Comments

Page 38: Pattern Recognition in Speech and Language Processing

���� ��� �

��

���

������������ � �������

���� �������� � �������

Page 39: Pattern Recognition in Speech and Language Processing

� �

1.6 Embedded String Model Based MCE Training

� �

Page 40: Pattern Recognition in Speech and Language Processing

FIGURE 1.2A structure diagram of a context dependent head-body-tail digit model inspeech recognition.

1.6.1 String Model Based MCE Approach

Page 41: Pattern Recognition in Speech and Language Processing

� � ���� � � � � �� �

��� � ��������

�� ������ � �� � �

�� ���

�� ��� ������ � �� � � ���

� � � ��� � � � ����

� ��

� �

� ���� � � � � ���

�� � �������

�� ������ � � � �

�� � ������� �������� �����

�� ������ � � � �

� �� � � � � �

������ � �� ������ � �� � � �

�� � �

�� ��� ������ � �� � �

���

������� � �� �������� � ��� � � �

��� ��� �� �������

��� � �

���� � �������� � ����

� � �

�� ������

������ ������

� �

Page 42: Pattern Recognition in Speech and Language Processing

FIGURE 1.3A diagram of the embedded string model based MCE training process.

������ ��

� � ����������

���� � �� ���������

� �

� �

Page 43: Pattern Recognition in Speech and Language Processing

� ���

��

������ � LD������ ���������������� � ����������

LD������ ������� ��

��

Page 44: Pattern Recognition in Speech and Language Processing

������� ��

1.6.2 Combined String Model Based MCE Approach

��

��� ��� �� � � ��� ��� �� � � �� ���� �� �

�� � �� �� ��

��

Page 45: Pattern Recognition in Speech and Language Processing

1.6.2.1 Discriminative Model Combination

���� � � � ������� � ��� � � � ��� �

� �

��� � ��� � � � ��� � �

��

���

����� � ����

��� � ��� � ��

��� � � �� � � � ��� �� �� � � � � ��

������ � ����������� � ����

�� ������

����������

� �

������ �

� ���������

��� � � � � ��

Page 46: Pattern Recognition in Speech and Language Processing

1.6.2.2 Discriminative Language Model Estimation

“too” “two”

� ��� � ����� � � � � ��������� �� �

Page 47: Pattern Recognition in Speech and Language Processing

1.6.3 Discriminative Feature Extraction

Page 48: Pattern Recognition in Speech and Language Processing

����

���� �

�� � � ������

�� � � �� � � � � �

� � � � ��

� � � � � �������

1.7 Verification and Identification

�� ��

Page 49: Pattern Recognition in Speech and Language Processing

� ��

���

�� ��

��� ��

�� � �� �� ��� ��

� �� ��

�� ���� � �� � �� �� � � �� �� � � �� ��

� � � ���� � � �� � �� � ���

� � ���� � � �� � �� � ��� � �� � �� � �� � ����

� � � �� � �� � ����

��

� �� � ��� � �� ����

����� ��� � �

��� ��� � � �� � ����� �� � ��� ��� � ��� � ������� � ��� �

� � ��� � � � ��� � �

Page 50: Pattern Recognition in Speech and Language Processing

FIGURE 1.4Block diagram of a speaker verification system

1.7.1 Speaker Verification and Identification

� �

� �� � ��� � �

��

Page 51: Pattern Recognition in Speech and Language Processing

� ����

� � �������

� ��� ���

��

� ��� �� � �� �

� ��� ��� � �� � ��

�� � ���� � ���

�� ���� �

�� ��

�� � ���

�� � �� ��

������

�� � ��� � ������

����

�� � ���

� ��� ��

Page 52: Pattern Recognition in Speech and Language Processing

������� �

��������� �������� � �

�������� � ������� � � �

������� �

������� ���

���������

������ ��

���

���

�������������� � �

������ ��

���

���

�������������� �� ��

���

����� � �������� � ���������

�� ��

1.7.2 Utterance Verification

Page 53: Pattern Recognition in Speech and Language Processing

� ������ ���� � �������

��� � �������

������ ������� �� ��� � ������� �

��� � ���� ��

��� � ������� ���

��������

� ���

��

�� �� ���

��� � �������

��� � ������� � ��� � ���� � �

���� � ���

������ � ����������

������� � � ����� � ���� � ����� � ���

�������� �

� ���������������

� �

� �

� �

���

� �� � �� �� ���

� �� � ��

� ��

�� ��

Page 54: Pattern Recognition in Speech and Language Processing

� �������� � � � � �� � �

��

���

���� � ����

���� � ��� ��

������� � � � � �� � �

��

���

����� � ����� ���� � �����

��� � � � � ��

Page 55: Pattern Recognition in Speech and Language Processing

1.8 Summary

Page 56: Pattern Recognition in Speech and Language Processing

Acknowledgement

References

IEEE Trans. on Elec-tronic Computers

IEEE Transactions on Computers

CLSP Research Note No. 40

Proceed-ings of ICASSP-86

Page 57: Pattern Recognition in Speech and Language Processing

IEEE Trans. Speech and Audio Processing

IEEE Transactions on Pattern and MachineIntelligence

Ann. Math. Stat.

Inequalities

Bull. Amer.Math Soc.,

Pacific J. Math.

Mathematical Statistics

Adaptive Algorithms and StochasticApproximations

Proc. 1997 Workshop onAutomatic Speech Recognition and Understanding Proceedings

IEEE Trans. Signal Processing

Proc. ICASSP92

IEEE Trans. Speech and Audio Processing

IEEE Proc. ICASSP-92

Page 58: Pattern Recognition in Speech and Language Processing

IEEE Proc. ICASSP-93

Proc.ICSLP-94

Proc. DARPA ANN Tech. Program CSR Mtg.

Proceedings of The IEEE

International Journal ofPattern Recognition and Artificial Intelligence

“Adaptive discriminative learning in pattern recog-nition,”

Elements of Information Theory

IEEE Proc. ICSLP’98

IEEE Transactions on Comput-ers

J. Roy. Soc.

Stochastic Process

Pattern Classification and Scene Analysis

IEEE Transactions on Information Theory

IEEE Transactions on Informa-tion Theory

Page 59: Pattern Recognition in Speech and Language Processing

Porc.1997 IEEE Workshop on Automatic Speech Recognition and Understanding

IEEE Proc. ICASSP’98

IEEE Proc. ICASSP’96

IEEE Trans. on InformationTheory

IEEE Proc. ICASSP’88

IEEE Proc. ICASSP’98

Speech Communication

IEEE Proc. ICASSP-93

Proc. IEEE

Proc. of theIEEE

Advances in Speech Signal Processing

Statistical Methods for Speech Recognition

IEEE Trans. Acoust. Speech Signal Processing

Technometrics

IEEE Trans. onInformation Theory

Page 60: Pattern Recognition in Speech and Language Processing

IEEE Trans. Acoust., Speech & Sig.Proc.

IEEE Trans. Acoust., Speech & Sig. Proc.

IEEE Trans. on Speech and Audio Process-ing

Proc. ICASSP’95

IEEE Trans. Acoustic., Speech, SignalProcessing

IEEE Transactions on Audio andSpeech Processing

Proc.ICASSP’97

IEEE Proc.ICASSP-92

Proc.IEEE-SP Workshop on Neural Networks for Signal Processing

Artificial Neural Networks for Speech and Vision

IEEE Proceedings

IEEE Transactions onSpeech and Audio Processing

Proc. ICASSP’98

Page 61: Pattern Recognition in Speech and Language Processing

IEEE Proc. ICSLP’96

The Development of the SPHINX System

Proc. ICASSP’90

Testing Statistical Hypotheses

Proc. ICASSP’96

Proc.ICSLP96

Computer Speech and Language

Proc. EuroSpeech’97

Proc. NORSIG’98

Proc. ICASSP’98

IEEE Trans. Audio & Speech Proc.

J. Acoust. Soc.Am.

Proc.ICSLP’96

Proc.ICASSP’96

Page 62: Pattern Recognition in Speech and Language Processing

Computer, Speech and Lan-guage

IEEE Transaction on Speech and AudioProcessing

IEEE Proc.ICASSP’99

Comput. Speech Language

Proc. EuroSpeech’97

Adaptive, Leaning and Pattern Recognition

IEEE Trans., on Acoustics, Speech and SignalProcessing

IEEE Trans. on Speech and AudioProcessing

Proc.ICASSP’99

Proc. EuroSpeech’95

Convergence of Stochastic Process

Proc. IEEE

Fundamentals of Speech Recognition

IEEE Proc. ICASSP’95

Page 63: Pattern Recognition in Speech and Language Processing

IEEE Proc. ICSLP’96

IEEE Proc. ICASSP’96

Proc. 1995 EuroSpeech’95

Proc. EuroSpeech’99

ESCAWorkshop on Interactive Dialogue in Multi-Modal Systems

SIAM Review

IEEE Proc. ICASSP’98

Proc. ICSLP’92

IEEE Proc. ICASSP’95

Neural Network for Signal Processing II

IEEE Proc. ICASSP’98

Proc. ICASSP 91

IEEE Proc.EuroSpeech’97

Proc. ASRU’99

IEEE Proc. ICSLP’96

Page 64: Pattern Recognition in Speech and Language Processing

IEEE Transactions on Speech and Audio Processing

IEEE Proc. ICASSP’98

Proc.ICASSP’96

Speech Commu-nication

IEEE Transactions on AutomaticControl

IEEEProc. ICASSP’99

IEEE Proc.ICASSP’96

Proc. ICASSP-2002

IEEE Transactions on Image Processing

Page 65: Pattern Recognition in Speech and Language Processing

2

Minimum Bayes-Risk Methods in AutomaticSpeech Recognition

Vaibhava Goel� and William Byrne�

�IBM; �Johns Hopkins University

CONTENTS

Page 66: Pattern Recognition in Speech and Language Processing

2.1 Minimum Bayes-Risk Classification Framework

� � ��� ��� ���� ��� � ��� ��� ���� �� �� �

��

��

hypothesis space

� ��� ��� � � ���

����� ��� � � � � ��

� ����� ��

� �����

� ���������� ������

���� �����

��� � ����� ����

���

����� ��� �� ����

Page 67: Pattern Recognition in Speech and Language Processing

� �� ������ �

��

� � �� � ��� �� ��� � ���

��� � ����� ����

�����

����� ��� �� ����

expected loss

��� �� ��

�����

����� ��� �� ����

� �

���

��� evidence space �

� �� ��� evidencedistribution

2.1.1 Likelihood Ratio Based Hypothesis Testing

� �

Æ ��� �

��

� ������� ������

� �

�� � ��� �� �� � ��� ��

Page 68: Pattern Recognition in Speech and Language Processing

� ���� � �

�������

� � � ��� � � ����� � � ��� � � ����� � � ��� � � ���� � � ��� � � ���

��

����� � �� ����� ����

�� ����� � �� ����� ���� ��

����� � ����� ������ ����� �� ������� ����� � �� ������� ����

2.1.2 Maximum A-Posteriori Probability Classification

Æ �� � ���������

�� ���

�������� �� �

� � � ����

�� � ������ ����

�� ���

�� ��� ��

������������� ����

�� ���

2.1.3 Previous Studies of Application Sensitive ASR

Page 69: Pattern Recognition in Speech and Language Processing

2.2 Practical MBR Procedures for ASR

��� ���

��� ����

Page 70: Pattern Recognition in Speech and Language Processing

2.2.1 Summation over Hidden State Sequences

� �� ��� � � �� �� ���� ��� ����

� �� � language model� ���� �

acoustic model� � ���� � �

� � ���� �

� ���� � ��

���

� ���� �� �

��

���

� �� �� �� ������ ��

� �� �� �

��� � ������� ��������

����

������������

�������� �� �� � ��� � �������

�� �� �

��� ��

� � �� ��� � ��

� ��� �� ����� �� ��

� ��� ��

��� ��

��� � ������ ����

����

����� ��� ������

� � �����N-best list lattice

Page 71: Pattern Recognition in Speech and Language Processing

2.2.2 MBR Recognition with N-best Lists

� �����

�� ��

��� � ������� ����

����

����� ��� ������

2.2.3 MBR Recognition with Lattices

��

��

��

��

2.2.3.1 Lattice Definitions

� �� � � � ��� ��� �� �� �� ��

� � � � � �

Page 72: Pattern Recognition in Speech and Language Processing

����

path complete path � �

�� path segment�� �� �� �� ��

�� partial path � �

� � ��

�� �� �� �����

�� �� �� �� ��

�� �� �� ����� �� �����

����� � �

��� ���� ������ �� ��� ���� ��������� �� ��� ���� ���������

partial path log-probability lattice backward log-probabilitylattice total probability ��

��

�� ���� � �� �� ���� ������� �

��

������ � ����

�������������� ���� ���������

��

��

��

� ���� � � ��� ���� � ������� �

�� ���� ������

� ���� � �

����� �� ���� ������ � ��

�� �������������

� ���� ����������

�� �

� �

�����

�� �������������

� ���� ����������� ���� ������

�� �

Page 73: Pattern Recognition in Speech and Language Processing

FIGURE 2.1An example lattice. The time marks correspond to the node times and theword ending times. The numbers on the edges are logarithms of conditionaljoint probabilities as described in the text. The partial path log-probability ofa partial hypothesis is the log of the probability of its path; the partial path�� � (‘HELLO’,‘0.6’) in this lattice has value ����. The lattice backwardlog-probability of a partial hypothesis �� is the log of the sum of probabili-ties of all lattice paths from end node of �� to the lattice end node; for thepartial path �� � (‘HELLO’,‘0.6’) in this lattice these paths are indicated bydotted lines and the lattice backward log-probability of this �� is �����. Thelattice total probability of a partial path is the exponentiated sum of its partialpath log-probability and lattice backward log-probability; its value is ����� for�� � (‘HELLO’, ‘0.6’) in the lattice above.

� �

�����

�� ��������������

� �� ���� ��

���

��

�������������

� �� ���� ��

� ��� ��

��

Page 74: Pattern Recognition in Speech and Language Processing

2.2.3.2 �� Search Under General Loss Functions

����

�� � �

�� �� �������

��� � ����� ������

������

����� ��� ������

� � �� ��

��� �� ��

������

����� ��� �����

� �

��

��

���� � ��������������

������

������ ����� ������

� �

�� �� ��

������

����� ��� ������

��

��

2.2.3.3 Single Stack Search Under Levenshtein Loss Function

���� ��

Page 75: Pattern Recognition in Speech and Language Processing

� �

���

����� ��

������

���������

�����������

� � �� �� ��

�� �� � ���� ���

���������� � �� ��

� �

� �� ��� � �� ��

� �� � �� ��

��� �� ��

������

�� �� ���� �� �� �� �� � � �� ��

���

�� � ���� � ���������

� � � �� �� ��� �� ��� ��

���� � �������

Page 76: Pattern Recognition in Speech and Language Processing

����� ��

��� �� ��� �� � �

��� ��

���� � ���

2.2.3.4 Prefix Tree Search Under Levenshtein Loss Function

�� ���

�� � ����� � ���� ��

��������� �� ����� � �� �

��

����� ��

������

�����������

� � �� �� ��

�� �� � ���� ��� � � �� �

��

��������

�� ��� �� ���

��������������

��������

��� � ��� � � � � �� �

��

��������

��������������

��������

��� � ��� � ��

�� ��� �� ���

� � �� �

Page 77: Pattern Recognition in Speech and Language Processing

��

��������

��������������

����������

��� � ���� � �� � ���

� �����

�� ��

prefix tree

������ ����

��������������

����������

��� � ���� � ���

partial hypothesis comparison cost

2.2.3.5 Pruning and Multistack Organization of the Prefix Tree Search

Page 78: Pattern Recognition in Speech and Language Processing

��

2.2.3.6 Loss Functions Other than Levenshtein Distance

��

2.3 Segmental MBR Procedures

high con-fidence regions

low confidence regions

��

�� �

�� �� segment sets� ��� � �� ���� �

� ���� � ���

Page 79: Pattern Recognition in Speech and Language Processing

� ���� � ������� �� � � �� ��

��� �

�� �� ��

�� �

��

��� �� ��� � � ��

conjunction rule ��� �

�� � � ��

�� � �

� � �

� ��

� ��

� � � ������ � � �� �� �

����� �� �

��

���

�����

��� �� ��

�������

�� ���

Proposition.�

�� � ��������

���

��

��� � ������

�����

� �����

���� ������ ��� ����

��� ��� ���

��� �� ��

���������� ��� �

�� ���

Page 80: Pattern Recognition in Speech and Language Processing

� ��� ����

induced

������ �� �

��

���

�������� �� ��

�������

��� �� �

2.3.1 Segmental Voting

���� � ������� �����

� ���������

� ��������

� ���� �� �

��

���

���������� �� ��

�������

�������

�� ��

Page 81: Pattern Recognition in Speech and Language Processing

2.3.2 ROVER

���� � �� ����� �

� �� ��

���� ���� ��

� �� ��� ���

���

������ ���� � � �����

���

�� � ��

�� � �� ����

��

simultaneous alignment

� �

corre-spondence set

���

� �

� � �� ���

Page 82: Pattern Recognition in Speech and Language Processing

FIGURE 2.2An example word transition network.

���� ������ �� �

��

���

���������� �� ��

������

� � �

���� ������ ��� � �

2.3.3 e-ROVER

joining expanded

� � � �

Page 83: Pattern Recognition in Speech and Language Processing

FIGURE 2.3Joining two correspondence sets.

������ ������ �� �

��

��������� ����

��������� �� ��

������ � �����

���

� ��

����� �� � ������ ������ �� � ���������� ���

segmentation

Page 84: Pattern Recognition in Speech and Language Processing

2.4 Experimental Results

2.4.1 Parameter Tuning within the MBR Classification Rule

� �����

� �����

��������� � ���� �� ���� �� �� ��

�� � � �

�� �word insertion penalty �

languagemodel scale factor

likelihood scale factor �

����������� � ����� �� ���� �� �� ������

Page 85: Pattern Recognition in Speech and Language Processing

TABLE 2.1

� � ���� � � ����� � � ����

���������� ������ �� �������� ��

����� � �����

������ ���

������ ���

�������� ��� � � ����

2.4.1.1 Optimization of Likelihood Parameters

���������������� � � �

�����

�������

��� ����������

� � �������supervised optimization

unsupervised optimization

� � ���

Page 86: Pattern Recognition in Speech and Language Processing

� �

� � ����� � � ���� �

� � � �

2.4.2 Utterance Level MBR Word and Keyword Recognition

��

����� ��

���� �� � �������

��

���� �� �� �

��� ����

��

��

��

abilities, bartenders, calculation, databasesa, and, the, besides, collaboration, distribution

Page 87: Pattern Recognition in Speech and Language Processing

2.4.2.1 Likelihood Scale Factor Tuning

�� ��

2.4.2.2 N-best List Rescoring and �� Search

��

��

��

Page 88: Pattern Recognition in Speech and Language Processing

TABLE 2.2

��

2.4.3 ROVER and e-ROVER for Multilingual ASR

Page 89: Pattern Recognition in Speech and Language Processing

FIGURE 2.4Top panel shows the ratio of total number of e-ROVER correspondence sets tothat of ROVER correspondence sets, as a function of the pinching threshold.Bottom panel shows the WER performance of e-ROVER for these thresholds.

2.4.3.1 Correspondence Set Pinching

Page 90: Pattern Recognition in Speech and Language Processing

2.5 Summary

��

��

Page 91: Pattern Recognition in Speech and Language Processing

2.6 Acknowledgements

References

Mathematical Statistics: Basic Ideas andSelected topics

IEEE Conference on Acoustics, Speech, and Signal Pro-cessing

��� Hub-5 Conversational Speech Recognition Workshop

In Proceedings of the NIST and NSASpeech Transcription Workshop

IEEE Workshop on Au-tomatic Speech Recognition and Understanding

ACL99

IEEE Conference on Acous-tics, Speech, and Signal Processing

Word List With Content Word Marks

Minimum Bayes-Risk Automatic Speech Recognition

�� Eurospeech-99

Page 92: Pattern Recognition in Speech and Language Processing

In Proceedings of the NIST and NSA Speech Transcription Work-shop

Computer Speech and Language

Research Notes No. 40, Center for Language andSpeech Processing

IEEE Conference on Acous-tics, Speech, and Signal Processing

��� International Conference on Spoken Language Pro-

cessing

IEEE Conferenceon Acoustics, Speech, and Signal Processing

IEEE Transactions on Systems Scienceand Cybernetics

SIGART Newsletter

IBM Journalof Research Development

Statistical Methods for Speech Recognition

Proceedings of the 1997 Large Vocabulary Continuous Speech RecognitionWorkshop

IEEE Transactions on Signal Processing

��� International

Conference on Spoken Language Processing

IEEE

Page 93: Pattern Recognition in Speech and Language Processing

Conference on Acoustics, Speech, and Signal Processing

1997 IEEE Workshopon Automatic Speech Recognition and Understanding

Soviet Phys. Dokl.

Eurospeech-99

9th Hub-5 Conversational Speech Recognition Work-shop

9th Hub-5 Conversational Speech RecognitionWorkshop

Eurospeech-95

IEEE Transactions on Acoustics, Speech,and Signal Processing

IEEE Transactions on Acoustics, Speech, and Signal Processing

��

IEEE Conference onAcoustics, Speech, and Signal Processing

IEEE Trans. PAMI

IEEE Conference on Acoustics, Speech, and Signal Processing

Eurospeech-97

Estimation of Dependences Based on Empirical Data

Page 94: Pattern Recognition in Speech and Language Processing

IEEE Conference on Acoustics, Speech, andSignal Processing

IEEE Transactions on Acoustics, Speech, and Signal Processing

HTK 2.1

Page 95: Pattern Recognition in Speech and Language Processing

3

A Decision Theoretic Formulation for RobustAutomatic Speech Recognition

Qiang HuoThe University of Hong Kong, Hong Kong, China

CONTENTS

3.1 Introduction

� �

decision problem �

� �

� class �

� �

statistical pattern recognition

Page 96: Pattern Recognition in Speech and Language Processing

FIGURE 3.1Communication Theoretic View of ASR: Noisy Channel for Speech Generationand Signal Capturing (adapted from [68]).

� � �

�����

� ����� parametric family������ � ������ � � ���� � ������ �

� ���� ��

� �����training data

plug-in MAP a posteriori decision rule

�� � �����

� �� ��� � �����

������� � � ����� � �

�� �� ��

Page 97: Pattern Recognition in Speech and Language Processing

statistical decision

3.2 Optimal Bayes’ Decision Rule for ASR

� � � � �� �� � ������� � � � ����� � � ��

� ��decision rule ���� �

� � �� � ��

� � ����� � � �� � � �� ���� � ��

� � �� deci-sion space �� � ����� � � � ��� ����

�� ���� � �nonrandomized decision rule

������ � �� � � � ��� ���� � ��� ��� � � ����

� ������� ���������������� � � � ������ �� ��

��

���

������ � ��� �������

������ � �� ��� � �� � �� � � � �� � � � ���

Page 98: Pattern Recognition in Speech and Language Processing

� � ����������� sampling paradigm

������ � �� ������ � ������ � ����� �� �� �

loss��������� loss function

���� �

� � ����� � � �������� �� � ��

true distribution ����������� � ��� ��� � �� � ����

total risk ������� ����

������� � �����������������

��

����

�����

�����������������

�����

������

����

���������� �� ������

��

����

� �� �

�����

�������������� ��� �

������� � ������

�������

������� � �������

�����

������

����

���������� �� �������

����� � �� ��������

�����

���������� �� ����

Page 99: Pattern Recognition in Speech and Language Processing

Bayes’ decision rule

�������� �

�����

������

����

����������� �� �������

Bayes’ risk������

� 0-1 loss function

��������� �

�� � � ����� � �� ����

� � � ���� � �

��������� ��

����

� �� �

�� ������ �

����� ���

� ���

����

������� �

� �� ������ ����

������ minimum classificationerror ������� � �

� � �� ����

� �� ��� � �� ����

����� � � � �� �

MAP decision rule

���������

������ ����� � � �� �

�������

3.3 Adaptive Decision Rules Constructed from Training Samples

true ������true prior uncertainty

Page 100: Pattern Recognition in Speech and Language Processing

independent � � ��� ������ � ��� �� � � � � �� independent �� ����� �������

� ���� � ����� �� X independent �adaptive decision rule

3.3.1 Plug-in Bayes’ Decision Rules with Maximum-likelihood DensityEstimate

3.3.1.1 What are Plug-in Bayes’ Decision Rules?

plug-indecision rules � �� �� �� ������ ��

�� �� �� ����� �� � plug-indecision rule � � ������

� �� �� �� ������ ���� �� �� ����� ��

������ � � �� �������

�����

��������� �� �� ����

�� �� ��� ������� � �� �� ���

������ � �� �� �

����� ��� � ��

�� � �����

�� �� ��� � �����

������ � � �� �� �

plug-in MAP decision rule������

plug-in risk �������

������� ��

����

�� �� �

�����

��������������� ����

density plug-in estimator � �� �� �������� ��

������ � � �� ������

�������

�� �������

Page 101: Pattern Recognition in Speech and Language Processing

3.3.1.2 Why Could Plug-in Bayes’ Decision Rules Work?

��� �������

�� ���������������

Property: � �� �� �� ������ ��

����� �������� � �������� � �� ��������

Bayes’ risk consistency

Theorem: (Bayes’ risk consistency): � �� �� �� ������ ��

� ���

�� �� ������� � �� �� ������ �

������ ����� �� � �� � ���

��� ������������� �������� �

3.3.1.3 Implications on Parametric Models and Parameter Estimation

assume ������������ � ���� � ���� estimated

Bayes’ risk consistent

������ � ���� �

representative

Page 102: Pattern Recognition in Speech and Language Processing

Discrete HMM Contin-uous Density HMM �

������ � ���� �

finite state knowledge sourcesnetwork search

maximumlikelihood �

�����

������ � ���� �

�� ��� � � ��

�� � ������� �� �� � ����� �

������ �����

���� ��� point estimator �����minimum

discrimination informationdiscrimination information directed divergence

discriminative trainingmaximum mutual information

conditional maximum likelihood estimate H-criteria

corrective training minimum empirical classification error

Page 103: Pattern Recognition in Speech and Language Processing

3.3.2 Maximum-Discriminant Decision Rules Minimizing the Empiri-cal Classification Error

3.3.2.1 What are Maximum-Discriminant Decision Rules?

discriminant function ������ � �

� � ��

maximum-discriminant decision rule ����

�� � �����

������ �

� � ���

������� � ������

��

� ���� � �

minimum misclassification best-count ������� ���� � �

���� � �

�� ����� � ��������

������� �

density estimator

�� �� ������ � ��������

��������� � ���� ����� �

�� ����� ����

3.3.2.2 Why Could Discriminant Approach Work?

Theorem: (Uniform Convergence) m-convex �

� � � ���������������� uniformly ������ �

����������

������� � ��������������� � �

Page 104: Pattern Recognition in Speech and Language Processing

best-count ����� � ��

���� ������ � ����������

���������

��� ������ � ����������

���������

�� �����

����� � � ������ ����

��������� �

�����

3.3.2.3 Implications on the Choice of Discriminant Functions and the PracticalTraining Algorithms

3.3.3 Discussion

plug-in MAP

maximum discriminantminimum empirical classification error

Page 105: Pattern Recognition in Speech and Language Processing

representative

3.4 Violations of Modeling Assumptions in ASR

3.4.1 Types of Distortions

� �� �� ������� �� �� �� � ����� ��

� � ��� ������ � � �� �� � � � � ��independent �� ����� � ������

� representative�

� � �

Page 106: Pattern Recognition in Speech and Language Processing

� � � �

� �� �� � � � �� �� ������ � � ����� ��

� ������

modeling error estimationerror

3.4.2 Towards Adaptive and Robust ASR

Page 107: Pattern Recognition in Speech and Language Processing

3.5 Improving Adaptive Decision Rules via Decision ParameterAdaptation

3.5.1 Decision Parameter Adaptation for Stationary Operating Condi-tions

������ �

������

�� � ��� �

����

��� � � �� �� � � � � ���

�� �

����

�� � ������

� goals of adaptation� �

� �

Page 108: Pattern Recognition in Speech and Language Processing

3.5.1.1 Adaptation for Plug-in Decision Rules

Remark 1:regularization

imposing constraints

maximum penalized likelihood

Remark 2:

3.5.1.2 Adaptation for Maximum-Discriminant Decision Rules

w.r.t.

Page 109: Pattern Recognition in Speech and Language Processing

empiricalminimum

expected classification error

���� ������

�����

������������������ �

����������� ����

������ stochasticapproximation

���� ���� ������ � � �� �� � � � � ������

���� � �� � ��������� ������

������� ������ ��

����������

3.5.2 Decision Parameter Adaptation for Slowly Changing OperatingConditions

�� �

����

��

��

Page 110: Pattern Recognition in Speech and Language Processing

forgetting mechanisms

3.5.3 Decision Parameter Adaptation for Switching Operating Condi-tions

adaptive model fu-sion

Page 111: Pattern Recognition in Speech and Language Processing

3.5.4 Discussion

robustdecision rule

3.6 Robust Decision Rules

3.6.1 Decision Rule Robustness

���� � ����� ��� ���� � �� � � ��

��

� � ���

������ � �

��

�������� � ������������ �

��������� �� � �

��

�� � �������� � �����

��������

guaranteed (upper) risk ������ � �

��

�� ���

�� � �������� � ����������� �

Page 112: Pattern Recognition in Speech and Language Processing

���� � � ��

�������� overall risk �������� ��������robust (with respect to distortions ��

� ) decisionrules ������ �

����� � ��� ����

��������

minimax decision rule ������ �

����� � ��� ����

��������

predictive decision rule

�� � ��

��

minimax decision ruleBayesian predictive decision rule

� ����� � � �� ������� � ���� �

� � �

� uncertainty neighborhood� �

������

3.6.2 Minimax Classification Rule

��������� uncertainty neighborhood��� ����� � ��������� �����

� �

��

� � ������� �� ���� � � ����� � ���������� �

��

�� � �������� � ������������������

�����

���� �

�����

�������������� ���

��������

Page 113: Pattern Recognition in Speech and Language Processing

�����������������

��� � ��������� ������

�����

������������������

��������������� ����� ��� �

� �� � ����� � ��

��� � ��������� ��

����

����� �

�������� �

�����������

������ ��� �

���������

������ � �� �

������ � � ��������

������ �� �

minimax decision rule

����� �� �

��� � � � ������

�� �� �

��� �� ��

��� �� �

��� ��� ��

����� � ������ �� �� �� � ��

�� �� �

����� � ������ �� ��

� ��

��� � ������� � � � � ������� �

������ �� �

� �

minimax decision rule ���

������ � �� �

������ �� �

������� � �������� �

Page 114: Pattern Recognition in Speech and Language Processing

model-space stochastic matching

3.6.3 Bayesian Predictive Classification Rule

����

average out�

Bayesian predictive classification�����

�����

a priori ����������� � �

���� � � � �� � � ��

�� �� � � ����� � �

����

hyperparameters

����������� � �

���� � � �����

���� � � �����

���� � �

��� ����������� � �

���� �

������

����������� � �

���� �

������� � ��������� � �������

���� � �

���� �

���

���

�������� � ����������� � �

���� �����

� ����� � � ����� � �

����� � ������� � �����

���� �

���

������ � ��������� ���

����� � ������� � �����

���� �

���

������ � ��������� ���

������� �

����������� � �

���� � �

Page 115: Pattern Recognition in Speech and Language Processing

point estimate ��� �� ������� �

���

������� �empirical Bayes

��������� ��� � ������� � �������

��� �������� �

��������� ��� ������� � prior uncertainty�

� ����������� � �

���� �

��� �� �

� representative �������

������� ��������� ���

prior uncertainty ��� �

� �

��

� � ������� �� ���� � � ����� � ��������� ���� � � ���� � ��� �

��������� �����

� overall risk �������

������� � ���������������������

��

����

�����

���

���

��������������������������� ���������

��

����

�����

������������� � � �� ��� �

����� � �

���

����������������� �

� �� � �

���

��� ������������

predictive densities ������� ��������

Page 116: Pattern Recognition in Speech and Language Processing

��������

����� � ������

�� ����� � ������

������ � � �� �� � �

����� Bayesian predictive classification(BPC) rule

�������� �����

��� ��

��

� � ������� �� ���� � � � �������� � �������� ��� �� � ���� ��� �

������ ������� � ��� � ��

�� �

������ � �

����

������ � ������������� �

�� �� � �

����

��� �� � �������������� �

���� ����model parameter

uncertainty

��

�� ������ �� � �� � � ����� � � � � ��������� � � ������� ��

� �� � � � � ����� �� � � ����� �� � �� � � �

����� � � �� �������� � ����� � ������ �

���� �

Page 117: Pattern Recognition in Speech and Language Processing

3.6.4 Discussion

������� � training set �

����������� � �

���� � �

������� �� ������ �

����� � ��� ��

����� � ��� ��� �

������� �

reproducing density

approximate Bayesian (AB) decision rule

�� � �����

����������� � � ������� �

��� ������� ������ � �

��

Page 118: Pattern Recognition in Speech and Language Processing

������� �

�������� � � ������� �� � �

�� � �����

���� ����� �� � � ����� �� �

���� �Bayesian minimax rule

Bayesian predictive density

Bayesian predictive density based model compensation

3.7 Summary

Page 119: Pattern Recognition in Speech and Language Processing

� class�

Acknowledgement

References

Acoustical and Environmental Robustness in Automatic SpeechRecognition

Proc. of ICASSP-2001

IEEE Trans. on Speechand Audio Processing

Page 120: Pattern Recognition in Speech and Language Processing

Statistical Prediction Analysis

IEEE Trans. on ElectronicComputers

IEEE Trans. on Pattern Analysis and MachineIntelligence

Proc. of ICASSP-86

IEEE Trans. Speech and Audio Processing

Speech Recognition

IEEE Trans. on Acoustics,Speech, and Signal Processing

Inequalities

IEEE Signal Pro-cessing Letters

Proc. of Eurospeech-2001

IEEE Trans. on Speech and Audio Pro-cessing

Proceedings of the IEEE

Proc. of ICASSP-1998

Speech Communica-tion

Page 121: Pattern Recognition in Speech and Language Processing

Pattern Classification and Scene Analysis

Pattern Classification

Spoken Dialogues with Computers

IEEE Trans. on InformationTheory

IEEE Trans. on Information Theory

Proc.IEEE

Mathematical Statistics: a Decision Theoretic Approach

Proc. ETRW onRobust Speech Recognition for Unknown Communication Channels

Speech Communication

IEEETrans. on Speech and Audio Processing

Proc. of Eurospeech-97

Proc. ICSLP-00

Handbook of Statistics

Predictive Inference: An Introduction

Journal of the American Statistical Association

IEEE Trans. on Information Theory

Page 122: Pattern Recognition in Speech and Language Processing

Computer Speech and Language

Speech Com-munication

IEEE Trans. on Speech and Audio Processing

Biometrika

Proc. ICASSP-88

IEEE Trans. on Speech andAudio Processing

Proc. Eurospeech-01

RobustStatistics: The approach Based on Influence Functions

Speech Communication

Speech Communication

IEEE Trans. on Automatic Control

Proc. of Eurospeech-99

Spoken language processing: aguide to theory, algorithm, and system development

Robust Statistics

IEEETrans. on Speech and Audio Processing

Page 123: Pattern Recognition in Speech and Language Processing

IEEE Trans. on Speechand Audio Processing

IEEE Trans. on Speech and Audio Processing

Speech Communication

Proc. ICSLP-2000

IEEE Trans. on Speech and Audio Processing

Proc. ICASSP-2000

IEEE Trans. on Pattern Analysis and Machine Intelligence

Proceed-ings of the IEEE

Statistical Method for Speech Recognition

Advances in Speech Signal Processing

IEEE Trans. on Speech andAudio Processing

IEEE Trans. on Speech and Audio Processing

SpeechCommunication

Page 124: Pattern Recognition in Speech and Language Processing

IEEE Trans. on Speech and Audio Processing

IEEE Trans.on Information Theory

IEEE Transactions on Acous-tics, Speech, and Signal Processing

Technometrics

Computer Speechand Language

IEEE Trans. on Signal Processing

1996 IEEE Workshop on Neural Net-works For Signal Processing

IEEE Trans. on Speech and Audio Process-ing

Robustness in Automatic Speech Recognition:Fundamentals and Applications

IEEE Trans. on Infor-mation Theory

Proc. of IEEE

IEEE Trans. Acoust., Speech, SignalProcessing

Computer Speech andLanguage

IEEE Trans. on Speech and Audio Processing

Page 125: Pattern Recognition in Speech and Language Processing

Computer Speech and Language

Proc. of ICASSP-2001

Proc. of ASRU-1999

Robustness in Statistical Pattern Recognition

IEEE Signal Processing Letters

IEEESignal Processing Letters

Proc. ICSLP-96

Proc.ICASSP-98

Automatic Speech andSpeaker Recognition: Advanced Topics

Speech Communication

Proceedings of the IEEE

The Bell System Technical Journal

Proc.IEEE

IEICE Trans. Inf. & Syst.

Automatic Speech Recognition – The Development of the SPHINX-System

Page 126: Pattern Recognition in Speech and Language Processing

IEEE Trans. on Information Theory

Proc. ICASSP-90

Proc. Eurospeech-95

IEEE Trans. on Signal Processing

IEEE Trans. on Speech and Audio Process-ing

IEEE Trans. onNeural Networks

IEEE Trans. on Acoustics, Speech, and Signal Process-ing

IEEE Trans. on Acoustics, Speech, and Signal Processing

IEEE Trans. on Acoustics, Speech, and SignalProcessing

Proceedings of the IEEE

Proceedings of the IEEE

IEEETrans. on Speech and Audio Processing

AT&T Tech. Journal

Page 127: Pattern Recognition in Speech and Language Processing

Proceedings of the IEEE

Fundamentals of Speech Recognition

Pattern Recognition and Neural Networks

Annals ofMathematical Statistics

IEEESignal Processing Letters

IEEE Trans. on Speech and AudioProcessing

IEEE Trans. on Speech and Audio Processing

Proc. Workshop on Adaptation Methods for SpeechRecognition

Proc. ETRW on Ro-bust Speech Recognition For Unknown Communication Channels

IEEE Trans. on Audio and Speech Processing

Speech Communication

IEICE Trans. Inf. & Syst.

Adaptation and learning in automatic systems

Foundations of the theory of learning systems

Page 128: Pattern Recognition in Speech and Language Processing

Proc. ICASSP-01

Proc. of ICASSP-2001

Proc. ICASSP-00

Statistical Decision Functions

Proc. of ICASSP-99

Proc. ICASSP-2002

Proc. of Eurospeech-2001

The HTK Book Version 3.0

Page 129: Pattern Recognition in Speech and Language Processing

4

Speech Pattern Recognition using NeuralNetworks

Shigeru KatagiriNTT Communication Science Laboratories

CONTENTS

4.1 Introduction

Page 130: Pattern Recognition in Speech and Language Processing
Page 131: Pattern Recognition in Speech and Language Processing

4.2 Bayes Decision Theory

4.2.1 Preparations

� �

� � ����� ��

� � � �� ����� � � � ��� ��� � � � �

�� � �� ��

4.2.2 Decision Rule

���� � �� � � �����

��� � ���

���� ��� � ���� � ��

� �

Page 132: Pattern Recognition in Speech and Language Processing

4.2.3 Minimum Error-rate Classification

���� � �� � � �������

���� ����

4.2.4 Probability Function Estimation

���� ��� ���� �

���� � �� � � �������

��� ����������

���� ��� ��� �� �������

Page 133: Pattern Recognition in Speech and Language Processing

4.2.5 Discriminative Training

4.2.5.1 Functional Form Embodiment of the Entire Process

���� � �� � ����� � �� �

�� �

� � �

���� ���

����� � ����

�����

� � �� ��� � � ���� � �

� �

4.2.5.2 Discriminant Functions

Page 134: Pattern Recognition in Speech and Language Processing

4.2.5.3 Loss over an Individual Pattern

� � �

������� � ������ � ��� ��

� � ��������������� � �

�����

������� �

�� ���� � ���

4.2.5.4 Loss over Multiple Patterns

Page 135: Pattern Recognition in Speech and Language Processing

� � ���� � � � � ���

����� ��

������ ������ � ����

� � � ��

�� � �

4.2.5.5 Adjustment of Trainable System Parameters

[Probabilistic Descent Theorem]

���� � �� �

������� ��������

������� �������� � �������������� ������

���

��

���

����

��

���

���� �

���� �� � ���� � ������� ��������

Page 136: Pattern Recognition in Speech and Language Processing

��

����

���� ��

��

����������� � ����� � ������

���� � � � �� �������� � ���������

���� �

4.2.5.6 Training Optimality

�����

��

��� � ��� � � ����� ���

Page 137: Pattern Recognition in Speech and Language Processing

�� ��� ��

��

����������� � ������ � �����

���

��

���������� � ���

� �

�� ����� ��� �� �

�� ����� ���

����

����

� �

�� ���

��

���� ��� ��

���

�� ���

� ���

���

� ������������ � ������

�� �

�� �

�� � � � � ������ ��� �� �

�� ������ ���

��

��

Page 138: Pattern Recognition in Speech and Language Processing

4.2.5.7 Global Design Scope

���� � �� � ������ � ��� ���

����� ���� ��

��� � ��

4.3 Speech Recognizers Based on Neural Networks

4.3.1 Preparations

� �

Page 139: Pattern Recognition in Speech and Language Processing

4.3.2 Classification Error Minimization

4.3.2.1 Learning Vector Quantization

� ��

�� � �� �� � ��

Page 140: Pattern Recognition in Speech and Language Processing

������ �� � ������ ��������� � ������������ �� � ����� � ��������� � �������

��� � ����

� �

4.3.2.2 Shift-tolerant LVQ Classifier

Page 141: Pattern Recognition in Speech and Language Processing

FIGURE 4.1Architecture of shift-tolerant LVQ classifier [20].

4.3.2.3 LVQ/HMM Hybrid Classifier

Page 142: Pattern Recognition in Speech and Language Processing

FIGURE 4.2Block diagram of LVQ/HMM hybrid classifier.

4.3.2.4 HMM/LVQ Hybrid Classifier

Page 143: Pattern Recognition in Speech and Language Processing

FIGURE 4.3Block diagram of HMM/LVQ hybrid classifier.

� �

�� ��� � ��

Page 144: Pattern Recognition in Speech and Language Processing

� �

4.3.3 Squared Error Minimization

4.3.3.1 Training Using the Squared Error Loss

���� � �� ��

��

���

����� � ��� �����

���� �

� ��

�� �

�� � �

���� � �� � ����� � �� �

������ � ����

����� ���

����� � ����

����� � �� �

����� ���

����� � �����

���� � �� � ����� � �� �

����� ���

����� � �����

Page 145: Pattern Recognition in Speech and Language Processing

FIGURE 4.4Architecture of time-delay neural network [27].

4.3.3.2 Time-delay Neural Network

Page 146: Pattern Recognition in Speech and Language Processing

c c c

c c c

FIGURE 4.5Schematic description of distance classifier as a single intermediate layer net-work (2-dimensional input, 3 references/class, 3 classes).

4.3.3.3 Multi-state Time-delay Neural Network

Page 147: Pattern Recognition in Speech and Language Processing

4.3.4 Cross Entropy Minimization

4.3.4.1 Training Using the Cross Entropy Loss

��

�� � �

� � �

��

���

��

���

�� ��������� ��

� ���

���

����������� ���

��� ������ �

� �

�����

������������� �

�� ��

������

������ �

���

�� ����� ��������� � ������ ��������� �

��������� �����

��� ������ ��

Page 148: Pattern Recognition in Speech and Language Processing

��������� �� � � ������������ ��

�� ��

������

������ ��

��

� �

�����

��������� ���

4.3.4.2 Unidirectional Network Classifier

� � �� �

�� ������� � ��

4.3.4.3 Bidirectional Network Classifier

Page 149: Pattern Recognition in Speech and Language Processing

W

V

utyt

st s(t+1)

Time delayut : Input vector

st : State vector

yt : Output vector

FIGURE 4.6Architecture of unidirectional network [23].

4.4 Fusion of Multiple Classification Decisions

4.4.1 Principles

Page 150: Pattern Recognition in Speech and Language Processing

FIGURE 4.7Architecture of bi-directional network [25].

Page 151: Pattern Recognition in Speech and Language Processing

FIGURE 4.8Typical classifier design schemes of averaging-based decision fusion.

Page 152: Pattern Recognition in Speech and Language Processing

4.4.2 Examples of Embodiment

4.4.2.1 Multi-codebook Classifier Designed with GPD

Page 153: Pattern Recognition in Speech and Language Processing

FIGURE 4.9Relation between recognition accuracy and the number of prototypes per classand codebook [3].

4.4.2.2 Multi-class Classification Based on Support Vector Machine

Page 154: Pattern Recognition in Speech and Language Processing

4.4.2.3 Decision Fusion Using Different Classifiers

Page 155: Pattern Recognition in Speech and Language Processing

FIGURE 4.10Typical block diagrams of the MSTDNN-based audio-visual speech recognition[7].

4.4.2.4 Decision Fusion Using Multi-modal Classifiers

Page 156: Pattern Recognition in Speech and Language Processing

FIGURE 4.11Block diagram of the twofold-HMM-based audio-visual speech recognition [21].

Page 157: Pattern Recognition in Speech and Language Processing

4.5 Concluding Remarks

Page 158: Pattern Recognition in Speech and Language Processing

References

Page 159: Pattern Recognition in Speech and Language Processing
Page 160: Pattern Recognition in Speech and Language Processing

4.6 Appendix: Maximizing Mutual Information

���� � �� � ������ ������

� ���� ����� �����

� ��

����� � �� � ��

�� ���� �

��

��� ��� ���� ����� ����

���� ����

� � �� ���� ���� � ��

���

����� ���

� ������� ��������

� �

Page 161: Pattern Recognition in Speech and Language Processing

�� ���� ����

���� � �� � ����� � �� � ��

���

����� ���

� ������������

��� �

���� � �� � ���� � ��

Page 162: Pattern Recognition in Speech and Language Processing

5

Large Vocabulary Speech Recognition Basedon Statistical Methods

Jean-Luc Gauvain and Lori LamelLIMSI, France

CONTENTS

5.1 Introduction

Page 163: Pattern Recognition in Speech and Language Processing

5.2 Overview

���� � � � �������

���� ��� �� �� �

� �

���� ���� �� ��

� � ����� ��� ���� ��� �� �

������ �� ���� ���

� ���� �n

� ���� �� �

���� ���

Page 164: Pattern Recognition in Speech and Language Processing

FIGURE 5.1LVCSR speech generation model: The word sequence � produced by the lan-guage model is successively transformed by the pronunciation model (� �� �� �)and the acoustic model (��� ���� �), resulting in the speech signal � .

5.3 Language Modeling

n

n� � ���� ��� ���� ���

Page 165: Pattern Recognition in Speech and Language Processing

FIGURE 5.2System diagram of a generic speech recognizer based on statistical models, in-cluding training and decoding processes and the main knowledge sources.

� �� � �

��

���

������������� ���� ����� �����

� � �� �� �

nn

�� ��� � � �� �����

� � �

��

���

� ��������� �������

Page 166: Pattern Recognition in Speech and Language Processing

� � ���� ���� ��� � � � � �� ���

5.3.1 Text Preparation

n

� one hundred fifty dollars �

nineteen ninety one one thousand nine hundred and ninety one

hundred � � hundred andmillion dollars

million

� �� � ������������

FIGURE 5.3Some example transformation rules applied during text normalization with as-sociated probabilities.

Page 167: Pattern Recognition in Speech and Language Processing

million officials

neunzehnhun-derteinundneunzig neunzehn hundert einund neunzig

5.3.2 Vocabulary Selection

5.3.3 N-gram Estimation

� ��������� ����� �������� ����� ���

������� �����

���� �

Page 168: Pattern Recognition in Speech and Language Processing

n

nn n

� �����

�� ��������� ����� � �� ��������� ������ ������

������ ������� ���������

��

���� �

��������������� ���� ������������������� ������������������ ���� ��������

� ���������� ����� � � �� ��������� ����� � ��� �� �� ������������ ���������

���� ���� ����� ��� ��

Page 169: Pattern Recognition in Speech and Language Processing

5.3.4 LM Adaptation

cache model trigger model topic coherence model-ing

n

5.4 Pronunciation Modeling

Page 170: Pattern Recognition in Speech and Language Processing

Phone Example Phone Example

��

� �

�� �

��

��

���

���

���

FIGURE 5.4Set of 45 phone symbols for English with illustrative words, with the portioncorresponding to the phone sound underlined.

excuse,record, moderate anti-, bi-, multi-, -ization

� � �

Page 171: Pattern Recognition in Speech and Language Processing

� �� � � �� � � �� ��� � � �� � � �� �� � ��

�� ��

�� � ��

� �� � �� � �

FIGURE 5.5Some example lexical entries and their pronunciations along with estimateprobabilities. For the compound words, the original concatenated pronunci-ation is given in the 1st line and the reduced forms are given in the 2nd line.

interest conferencecompany

don’t knowdid you going to

gonna, dunno

Page 172: Pattern Recognition in Speech and Language Processing

5.5 Acoustic Modeling

5.5.1 Acoustic Front-end

Page 173: Pattern Recognition in Speech and Language Processing

� � ������� ������ ��

��

Page 174: Pattern Recognition in Speech and Language Processing

FIGURE 5.6A simple 3-state left-to-right HMM topology commonly used for allophone mod-eling in LVCSR. The model generates at least 3 speech frames per allophone, re-sulting in a minimal phone segment duration of 30ms for frame rate of 100Hz.

5.5.2 Modeling Allophones

� �

� � ���� ������ � � � ���� ���� �� �

�������� � ��

��

���

��������������

� � ��

� � ������ �

Page 175: Pattern Recognition in Speech and Language Processing

/s�st�/s(*,�) �(s,s) s(�,t) t(s,�) �(t,*)s(*,�s) �(s,st) s(s�,t�) t(�s,�) �(st,*)

FIGURE 5.7Examples of allophonic transcriptions in terms of intra-word triphones andquinphones. Each contextual unit is defined by the central phone followed by itsphone context shown in parentheses (left-context, right-context). * is a wildcardsignifying any context.

������� ���

���

��� ������������

��� ��� ��� �

a priori

Page 176: Pattern Recognition in Speech and Language Processing

Position:General classes:

Vowel classes:

Consonant classes:Individual phones:

FIGURE 5.8Example questions used for decision tree clustering.

senones genones PELs tied-states

5.5.3 HMM Parameter Estimation

� �

Page 177: Pattern Recognition in Speech and Language Processing

Question Log likelihood gain Question Log likelihood gain

FIGURE 5.9The most frequently used decision tree questions for an American Englishbroadcast news transcription system [40]. The [+1] and [-1] indicate that thequestion has been applied to the right or left context respectively, and [0] to thephone itself.

�� � �������

�� ����

� �

Page 178: Pattern Recognition in Speech and Language Processing

A Posteriori

�� � �������

�� �� � � �� � �� �� �� � �

5.5.4 HMM Adaptation

Page 179: Pattern Recognition in Speech and Language Processing

� �

� � � ���

�� � �������

�� ���� ���

������� ��

����� � ��� � �

� ��

�� � �������

�� ��� ��� �� �

A b

Page 180: Pattern Recognition in Speech and Language Processing

5.6 Decoding

� �

� � � �������

� �� ��� � �������

���

� �� �� �� �� �������� �

� � � �������

������

� �� �� �� �� �������� ��

Page 181: Pattern Recognition in Speech and Language Processing

5.6.1 Speech/Non-speech Detection

Page 182: Pattern Recognition in Speech and Language Processing

5.6.2 Decoding Strategies

Page 183: Pattern Recognition in Speech and Language Processing

FIGURE 5.10Example word lattice generated by a speech recognizer using a bigram languagemodel for a 2.1s utterance. Each graph edge corresponds to a word hypothesisand a time interval (as specified by the time information on the nodes). In thisexample the word transcription with the highest likelihood is “sil IT WAS AGOOD PROGRAM sil” which happens to be what was said. (The acoustic andlanguage model likelihoods are not given on the figure.)

5.6.3 Efficiency

Page 184: Pattern Recognition in Speech and Language Processing

n

Page 185: Pattern Recognition in Speech and Language Processing

5.6.4 Confidence Measures

���� ��� �

���� ������

�������� �

���� ������

���� ����

5.7 Indicative Performance Levels

Page 186: Pattern Recognition in Speech and Language Processing

substitutionsinsertions

deletions

5.7.1 Dictation

Page 187: Pattern Recognition in Speech and Language Processing

5.7.2 Speech Recognition for Dialog Systems

Page 188: Pattern Recognition in Speech and Language Processing

n

exact

5.7.3 Transcription for Audio Indexation

Page 189: Pattern Recognition in Speech and Language Processing
Page 190: Pattern Recognition in Speech and Language Processing

5.8 Portability and Language Dependencies

Page 191: Pattern Recognition in Speech and Language Processing

References

The THISL Broadcast NewsRetrieval System,

Experiments in Vocal Tract Normaliza-tion,

A CompactModel for Speaker Adaptation Training,

One Pass Cross Word Decoding for Large Vocabularies Based on aLexical Tree Search Organization, 4

The Forward-Backward Search Strat-egy for Real-Time Speech Recognition,

Preliminary results on the performance of a system for the au-tomatic recognition of continuous speech,

AcousticMarkov Models used in the Tangora Speech Recognition System,

1

A Maximum Likelihood Approach toContinuous Speech Recognition,

PAMI-5

Page 192: Pattern Recognition in Speech and Language Processing

A Fast Match for Continuous Speech Recognition Using Allophonic Models,1

Large Vocabulary Recogni-tion of Wall Street Journal Sentences at Dragon Systems,

A maximization technique oc-curring in the statistical analysis of probabilistic functions of Markov chains

41

Vector quantization for efficient computation of continuous den-sity likelihoods, 2

A Baseline for the Tran-scription of Italian Broadcast News,

Word and acoustic confidence annotation for large vocabularyspeech recognition

Improvements in Language, Lexical and PhoneticModeling in Sphinx-II,

An empirical study of smoothing techniques forlanguage modeling, 13

Speaker, Environment and ChannelChange Detection and Clustering via the Bayesian Information Criterion

The Role of Word-Dependent Coartic-ulatory Effects in a Phoneme-Based Speech Recognition System

3

Statistical Language Modelling using CMU-Cambridge Toolkit,

Comparison of Parametric Representations ofMonosyllabic Word Recognition in Continuously Spoken Sentences,

28

Page 193: Pattern Recognition in Speech and Language Processing

Maximum Likelihood from In-complete Data via the EM Algorithm

39

Human SpeechRecognition Performance on the 1995 CSR Hub-3 Corpus

Genones: Optimization the Degree of Tying ina Large Vocabulary HMM-based Speech Recognizer,1

Speaker adaptation using con-strained estimation of Gaussian mixtures3

Sonograph and Sound Mechanics,22

Automatic Recognition of Phonetic Patterns inSpeech, 30

Human Speech Recognition Performance on the 1994CSR Spoke 10 Corpus

Comparison of speaker recognition methods using statistical featuresand dynamic features,ASSP-29

An improved approach to hidden Markov modeldecomposition of speech and noise,

Robust Continuous Speech Recognition usingParallel Model Combination, 9

Cluster Adaptive Training for Speech Recognition,

Semi-Tied Covariance Matrices for Hidden Markov Models,7

Transcribing Broad-cast News: The LIMSI Nov96 Hub4 System,

Spoken Lan-guage component of the MASK Kiosk

Page 194: Pattern Recognition in Speech and Language Processing

Speech Recognition for an Informa-tion Kiosk,

Partitioning and Transcription of Broad-cast News Data, 5

Developments in ContinuousSpeech Dictation using the ARPA WSJ Task,

Maximum a Posteriori Estimation for Multivari-ate Gaussian Mixture Observations of Markov Chains,

2

The LIMSI Broadcast News TranscriptionSystem 37

A Rapid Match Algorithm for Continuous SpeechRecognition,

A Probabilistic Approach to Confidence Mea-sure Estimation and Evaluation

Real-time Telephone-basedSpeech Recognition in the Jupiter Domain, 1

SWITCHBOARD: Telephone SpeechCorpus for Research and Development,

The Population Frequencies of Species and the Estimation of Popu-lation Parameters 40

A tree search strategyfor large-vocabulary continuous speech recognition,1

Linear Discriminant Analysis for ImprovedLarge Vocabulary Continuous Speech Recognition, 1

SegmentGeneration and Clustering in the HTK Broadcast News Transcription System,

Page 195: Pattern Recognition in Speech and Language Processing

News-on-Demand-’An Ap-plication of Informedia Technology’,

The ATIS Spoken LanguageSystems Pilot Corpus,

Perceptual linear predictive (PLP) analysis of speech,87

Large vocabu-lary continuous speech recognition using a hybrid connectionist-HMM system,

Signal Representation

Subphonetic Modeling with Markov States - Senone,1

Predicting Unseen Triphones withSenones, II

Continuous Speech Recognition by Statistical Methods,64

Statistical Methods for Speech Recognition,

A Dynamic LanguageModel for Speech Recognition,

: Speech BasedVideo Retrieval,

Maximum-Likelihood Estimation for Mixture MultivariateStochastic Observations of Markov Chains 64

Estimation of Probabilities from Sparse Data for the LanguageModel Component of a Speech Recognizer,

ASSP-35

Unsupervised Training of a Speech Recognizer: Re-cent Experiments, 6

The 1995 Abbot hybridconnectionist-HMM large-vocabulary recognition system,

Page 196: Pattern Recognition in Speech and Language Processing

Improved Clustering Techniques for Class-Based Statis-tical Language Modelling,

Improved backing-off for n-gram language modeling,1

Design of the 1994 CSR Benchmark Tests,

Toward Automatic Recognition of Broadcast News,

Heteroscedastic discriminant analysis and re-duced rank HMMs for improved speech recognition,26

Eigenvoices for Speaker Adaptation,

On Designing Pronunciation Lexicons for Large Vo-cabulary, Continuous Speech Recognition, 1

Speech Recognition of European Languages,

Continuous Speech Recognition at LIMSI,

A Phone-based Approach to Non-LinguisticSpeech Feature Identification, 9

Lightly Supervised and UnsupervisedAcoustic Model Training 16

Development of Spoken Language Corpora for Travel Infor-mation 3

Large-vocabulary speaker-independent continuous speech recogni-tion: The SPHINX system,

Speaker Normalization Using Efficient Frequency Warp-ing Procedures 1

Page 197: Pattern Recognition in Speech and Language Processing

Maximum Likelihood Linear Regression forSpeaker Adaptation of Continuous Density Hidden Markov Models,

9

Maximum Likelihood Estimation for Multivariate Observa-tions of Markov Sources IT-28

Speech recognition by machines and humans,22

Fast Speaker Change Detection for Broadcast NewsTranscription and Indexing 3

Multi-site Data Collection for a Spoken Language Corpus,

Finding Consensus in Speech Recognition:Word Error Minimization and Other Applications of Confusion Networks,

Subspace distribution clustering for continuousobservation density hidden Markov models,

Spoken Language Processing and Human-Machine Communica-tion in the European Union Programs,

An overview of EU programs related to conver-sational/interactive systems,

Algorithms for Bigram and Trigram Clus-tering,

News on Demand,43

Named Entity Extrac-tion from Broadcast News,

Full Expansion ofContext-Dependent Networks in Large Vocabulary Speech Recognition,

Large-VocabularyDictation using SRI’s Decipher Speech Recognition System: Progressive

Page 198: Pattern Recognition in Speech and Language Processing

Search Techniques, II

The Use of a One-Stage Dynamic Programming Algorithm for Con-nected Word Recognition,

ASSP-32

Improvements in BeamSearch for 10000-Word Continuous Speech Recognition,

I

Single-Tree Method for Grammar-DirectedSearch, 2

The Use of Decision Trees with Context Sensitive Phoneme Mod-elling,

A One Pass DecoderDesign for Large Vocabulary Recognition,

Recent Advancesin Japanese Broadcast News Transcription,2

Modeling Inverse Covariance Matrices by Ba-sis Expansion,

Language-model look-ahead for largevocabulary speech recognition,

A Word Graph Algorithm for Large Vo-cabulary Continuous Speech Recognition,11

The Role ofPhonological Rules in Speech Understanding Research,

ASSP-23

Continuous WordRecognition Based on the Stochastic Segment Model,

1993 Benchmark Tests for the ARPA Spoken Language Program,

1994 Benchmark Tests for the ARPA Spoken Language

Page 199: Pattern Recognition in Speech and Language Processing

Program,

1995 Hub-3 Multiple Microphone Corpus Benchmark Tests,

1998Broadcast News Benchmark Test Results: English and Non-English Word Er-ror Rate Performance Measures,

An efficient A� stack decoder algorithm for continuous speechrecognition with a stochastic language model,

Improved Discriminative Training Techniques ForLarge Vocabulary Continuous Speech Recognition

Evaluation of Spoken Language Systems: The ATIS Domain,

An Introduction to Hidden Markov ModelsASSP-3

Efficient Algorithms for Speech Recognition,

Stochastic pronuncia-tion modelling from hand-labelled phonetic corpora,29

Improvements in Stochastic Language Modeling,

Adaptive Statistical Language Modeling,

Two Decades of Statistical Language Modeling: Where Do WeGo From Here?,

88

Language-independent and langauge-adaptiveacoustic modeling for speech recognition 35

Page 200: Pattern Recognition in Speech and Language Processing

Memory-efficient LVCSR search using a one-pass stack decoder,14

New uses for N-Best Sen-tence Hypothesis, within the BYBLOS Speech Recognition System,

I

Improved Hid-den Markov Modeling of Phonemes for Continuous Speech Recognition,

3

NYU Language Modeling Experiments for the1995 CSR Evaluation,

A Markov Random Field Approach to Bayesian SpeakerAdaptation,

Modeling Those F-Conditions – Or Not,

Scalable backoff language models1

Automatic Segmentation, Classifica-tion and Clustering of Broadcast News Audio,

Evaluation of word confidence for speech recognitionsystems 13

Entropy-based Pruning of Backoff Language Models

Four-level Tied Structure for Efficient Repre-sentation of Acoustic Modeling,

An Investigation into Vocal Tract LengthNormalization,

Human Bench-marks for Speaker Independent Large Vocabulary Recognition Performance,

Speech discrimination by dynamic programming,4

Page 201: Pattern Recognition in Speech and Language Processing

Elements-wise recognition of continuous speech composed ofwords from a specified dictionary, 7

Verbmobil: Translation of Face-to-Face Dialogs,Plenary

Multilinguality in Speech and Spoken Language Systems88

Probabilistic Models for Topic De-tection and Tracking, 1

DragonSystems’ 1997 Broadcast News Transcription System,

Progress in Broadcast News Transcrip-tion at Dragon Systems,

Neural-Network based Measures of Confidence for Word Recognition,

Using word probabilities as confi-dence measures,

Unsupervised training of acoustic models for large vo-cabulary continuous speech recognition

The Zero Frequency problem: Estimating the prob-lems of Novel Events in Adaptive tex Compression

37

Large scale discriminative training of hiddenMarkov models for speech recognition,16

The de-velopment of the 1994 HTK large vocabulary speech recognition system,

The HTK large vocab-ulary recognition system for the 1995 ARPA H3 task,

Page 202: Pattern Recognition in Speech and Language Processing

A Hid-den Markov Approach to Text Segmentation and Event Tracking

1

A Review of Large-Vocabulary Continuous Speech Recognition,13

Multilingual large vocabulary speech recognition: the Euro-pean SQALE project, 11

Speech recognition evaluation: a review of the U.S.CSR and LVCSR programmes, 12

Tree-Based State Tying for High Ac-curacy Acoustic Modeling,

The Use of State Tying in Continuous SpeechRecognition, 3

Utilizing Untranscribed Training Data to Im-prove Performance

Maximum a Posteriori Adap-tation for Large Scale HMM Recognizers,

The MIT Speech Recog-nition System: A Progress Report

Page 203: Pattern Recognition in Speech and Language Processing

6

Toward Spontaneous Speech Recognition andUnderstanding

Sadaoki FuruiTokyo Institute of Technology

CONTENTS

6.1 Introduction

Page 204: Pattern Recognition in Speech and Language Processing

������������ ������ ���������������������� ������ ������������ �������������������������������������������������������������� ����� �

������� ��� ��

� ��� ��

��� ��� ��

���� �� �� ��

����� �����

�� � ��!���

����������" �������������

���" �������#�������!� �#�������!�

������������������������� ���� �! ���$��� ���! ��� �!��!

� ���� �! ���$��� ���! ��� �!��!

� �����" ������!�

� �����" ������!�

����� ��������������

���������

��� ������!��� ������!

�����������"��� �����������"���

��� ���������� ��� ����������

����������!����

������!

��!������!��!������!

"��� �������

"��� �������

�� �������������������������������������������������������������� ����� �

������� ��� ��

� ��� ��

��� ��� ��

���� �� �� ��

����� �����

�� � ��!���

����������" �������������

���" �������#�������!� �#�������!�

������������������������� ���� �! ���$��� ���! ��� �!��!

� ���� �! ���$��� ���! ��� �!��!

� �����" ������!�

� �����" ������!�

����� ��������������

���������

��� ������!��� ������!

�����������"��� �����������"���

��� ���������� ��� ����������

����������!����

������!

��!������!��!������!

"��� �������

"��� �������

FIGURE 6.1Progress of spoken language technology along the dimensions of vocabulary sizeand speaking styles.

Page 205: Pattern Recognition in Speech and Language Processing

6.2 Four Categories of Speech Recognition Tasks

Page 206: Pattern Recognition in Speech and Language Processing

TABLE 6.1

Page 207: Pattern Recognition in Speech and Language Processing

6.3 Spontaneous Speech Recognition and Understanding - Re-view

6.3.1 Category I (human-to-human dialogue)

Page 208: Pattern Recognition in Speech and Language Processing

6.3.2 Category II (human-to-human monologue)

Page 209: Pattern Recognition in Speech and Language Processing

%&%#'� �" �

��� ��

��� �" �

����(���

)��*+,

-���. �" �

�/� �" �

-��� �" ����� ���+� �" �

/����� �" �

%&%#'� �" �

��� ��

��� �" �

����(���

)��*+,

-���. �" �

�/� �" �

-��� �" ����� ���+� �" �

/����� �" �

FIGURE 6.2The SCANMail architecture [12].

Page 210: Pattern Recognition in Speech and Language Processing

6.3.3 Category III (human-to-machine dialogue)

Page 211: Pattern Recognition in Speech and Language Processing

FIGURE 6.3AT&T Communicator architecture [15].

Page 212: Pattern Recognition in Speech and Language Processing

6.4 Japanese National Project on Spontaneous Speech Corpusand Processing Technology

6.4.1 Project Overview

Page 213: Pattern Recognition in Speech and Language Processing

0��! #��� ������ ��� ��������

1����� ���� �!

0��!�����������������

%���#���!�����������������

+����� ������������

������� ��� ��

2���#��������

��� �������!�����������

.��������������������

�� ��� ��!������

�������� �� .�

3 ����

���� �� �"���

0��! #��� ������ ��� ��������

1����� ���� �!

0��!�����������������

%���#���!�����������������

+����� ������������

������� ��� ��

2���#��������

��� �������!�����������

.��������������������

�� ��� ��!������

�������� �� .�

3 ����

���� �� �"���

FIGURE 6.4Overview of the Japanese national project on spontaneous speech corpus andprocessing technology.

6.4.2 Corpus

Page 214: Pattern Recognition in Speech and Language Processing

FIGURE 6.5Overall design of the Corpus of Spontaneous Japanese.

6.5 Automatic Transcription of Spontaneous Presentation

6.5.1 Recognition Task

6.5.2 Language and Acoustic Modeling

CSJ:

Web :

Page 215: Pattern Recognition in Speech and Language Processing

TABLE 6.2

� �

SpnL WebL

SpnL:

WebL:

SpnA:

RdA:

6.5.3 Recognition Results

SpnLWebL WebL

Page 216: Pattern Recognition in Speech and Language Processing

FIGURE 6.6Test-set perplexity and OOV rate for the two language models.

SpnL WebL SpnA RdA

SpnL WebL SpnARdA

SpnL SpnA

Page 217: Pattern Recognition in Speech and Language Processing

FIGURE 6.7Word accuracy for each combination of models.

SpnA

SpnA

SpnL

SpnA � �

6.5.4 Analysis on Individual Differences

Page 218: Pattern Recognition in Speech and Language Processing

FIGURE 6.8Results of unsupervised adaptation.

6.5.4.1 Speaker Attributes

Page 219: Pattern Recognition in Speech and Language Processing

TABLE 6.3

0.280.32

-0.42 -0.47 -0.54 -0.62-0.40 -0.33-0.54 -0.51 0.33 0.520.38 0.38 -0.50 -0.41

-0.30 -0.31

6.5.4.2 Correlation Analysis

� �

Page 220: Pattern Recognition in Speech and Language Processing

6.5.4.3 Regression Analysis

����� � �������� � �������� � �������

������� ������ ������ �

����� � �������� � ������� � ������

������� ������ ������ ��

Page 221: Pattern Recognition in Speech and Language Processing

FIGURE 6.9Speaking rate vs. word accuracy.

6.5.4.4 Selection of Major Attributes

Page 222: Pattern Recognition in Speech and Language Processing

FIGURE 6.10Summary of correlation between various attributes.

TABLE 6.4

6.5.5 Discussion

Page 223: Pattern Recognition in Speech and Language Processing
Page 224: Pattern Recognition in Speech and Language Processing

6.6 Automatic Speech Summarization and Evaluation

6.6.1 Summarization of Each Sentence Utterance

� �

��

� � � ��� ��� � � � � ��� �� � � � �� ��� � � � �

��

�� � �

��

���

������ � � � �������������������������������� ����

�� �� �� � � � � �

6.6.1.1 Word Significance Score

�����

Page 225: Pattern Recognition in Speech and Language Processing

FIGURE 6.11An example of dependency structure.

6.6.1.2 Linguistic Score

����� � � � ����� � ���������

6.6.1.3 Word Confidence Score

�����

6.6.1.4 Word Concatenation Score

� ������� ���

Page 226: Pattern Recognition in Speech and Language Processing

i k k j j Lw w w w w w w wlwmwi nw

FIGURE 6.12A phrase structure tree based on a dependency structure.

� � �� (right-headed)

� � �� (left-headed)

� � �

� � �

� ��� � � � � �� ��

��

�� �� �� � � � �� � �� �

���� � � � �� � �� �

�� ��

�� ��

�� �� �� ���� � � � ��

����� ��� � � ��

Page 227: Pattern Recognition in Speech and Language Processing

������ ��� � ���

��

���

����

���

��

���

��

���

����� ��� �� �� ���

6.6.2 Summarization of Multiple Utterances

�� � � � � �� ����� ���� � � � � ���� ��� ��

� �� � � ��� ��� � � � � ��

6.6.3 Evaluation

6.6.3.1 Word Network of Manual Summarization Results for Evaluation

Page 228: Pattern Recognition in Speech and Language Processing

6.6.3.2 Evaluation Data

6.6.3.3 Training Data for Summarization Models

Page 229: Pattern Recognition in Speech and Language Processing

I_L

_T

SUB

I_L

_C_T

I_L

_C

I_L

I_L

RD

M I_L

_T

RD

M

I

I

REC TRSI_

L_T SU

B

I_L

_C_T

I_L

_C

I_L I_

L

RD

M I_L

_T

RD

M I

I

REC TRS

FIGURE 6.13Each utterance summarizations at 70% summarization ratio.

6.6.3.4 Evaluation Results

� �

� � �

� � � �

� � � �

� � � � �

Page 230: Pattern Recognition in Speech and Language Processing

I_L

_T

SUB

I_L

_C_T

I_L

_C

I_L

I_L

RD

M

I_L

_T

RD

M I I

REC TRS

I_L

_T

SUB

I_L

_C_T

I_L

_C

I_L

I_L

RD

M

I_L

_T

RD

M

I I

REC TRS

FIGURE 6.14Article summarizations at 30% summarization ratio.

6.6.4 Discussion

Page 231: Pattern Recognition in Speech and Language Processing

6.7 Spontaneous Speech Recognition and Understanding ResearchIssues

6.7.1 Language Models and Corpora

Page 232: Pattern Recognition in Speech and Language Processing

6.7.2 Message-driven Speech Recognition and Understanding

� �� ��� � � ��� � � � � �� � � ��� � � � � ��

� �

� �� ���

� � �� �� �

Page 233: Pattern Recognition in Speech and Language Processing

�����������

����������

� � �� ���� ��!��� �

������

( �! ����

0��!���������� �

-����������� �

0��!��!

���������

4������

� ������

���� .�

)����

�� � �

� " �� ������

5��

2��������������� �����

(��������

FIGURE 6.15A communication-theoretic view of speech generation and recognition.

����

� �� ��� � ����

� �� �� �� �� ����

����

� �� ��� � ����

� �� �� �� �� ���� ���

� ����

����

� �� ��� � ������

� �� �� �� �� ���� ���

� ����

� �� �� �

Page 234: Pattern Recognition in Speech and Language Processing

6.7.3 Statistical Approaches and Speech Science

-

6.7.4 Research on the Human Brain

Page 235: Pattern Recognition in Speech and Language Processing

6.7.5 Dynamic Spectral Features

Page 236: Pattern Recognition in Speech and Language Processing

FIGURE 6.16Speech-generation and speech-perception processes.

6.8 Conclusion

Page 237: Pattern Recognition in Speech and Language Processing

References

Page 238: Pattern Recognition in Speech and Language Processing
Page 239: Pattern Recognition in Speech and Language Processing
Page 240: Pattern Recognition in Speech and Language Processing

7

Speaker Authentication

Qi Li� and Biing-Hwang Juang�

�Bell Labs; �Avaya Labs Research

CONTENTS

7.1 Introduction

Page 241: Pattern Recognition in Speech and Language Processing

FIGURE 7.1Speaker authentication approaches.

Speaker authentication

7.1.1 Speaker Recognition and Verification

Speaker recognitionSpeaker verification

hypothesis test-ing Speaker identification

classification

Page 242: Pattern Recognition in Speech and Language Processing

FIGURE 7.2A speaker verification system.

direct methoddirectly

fixed pass-phrase system

text-prompted system

Page 243: Pattern Recognition in Speech and Language Processing

text-independentSV system

closed test open test

7.1.2 Verbal Information Verification

Page 244: Pattern Recognition in Speech and Language Processing

FIGURE 7.3An example of verbal information verification by asking sequential questions.(Similar sequential tests can also be applied in speaker verification and otherbiometric or multi-modality verification.)

in-direct method

Page 245: Pattern Recognition in Speech and Language Processing

7.2 Pattern Recognition in Speaker Authentication

7.2.1 Bayesian Decision Theory

� � ��

���� ��� ���� ��� � � �

�� � � �

a posteriori

� ������ ��������� ����

����

Page 246: Pattern Recognition in Speech and Language Processing

������� � ����

���� �

�����

�������� ����

�������� � �

��

��

������� �

�����

��������� ��� ����

Bayes decision rule� � �� ���� � � �������

�������� �

�� � � � � � �� ����� �� ��

������� �

�����

��������� ��� ���

��� ���

� ��� ��� � �� � �������

��

� ������

��� � � ��� ������

� �������

�������

��� � � ��� ������

�������� �����

� � ��������

Page 247: Pattern Recognition in Speech and Language Processing

��

� ������

� ������ �

�����

� ��������

�� � ��� ��������

�����

��������� �����

�� � ��� ��������

�����

� ��������� �����

7.2.2 Stochastic Models for Stationary Process

pdfpdf

pdf

�������� � �������� ���

���

��� ���������

�� �� ������� �� � � ���

� ������ ��� �

������������

��

���� � ��� ���� ��� � ���

�� �� �

Page 248: Pattern Recognition in Speech and Language Processing

��� ��

��

���

������� ��

��� �

��

��� ������� ������

��� ������� ��

��� �

��

��� ������� ����� � ������� � ������

��� ������� ��

������� �� ������������

��� ����������

��� ��� ���� ���

�� � � �� �������

��

���

� ���������

� � �

speaker-dependent

7.2.3 Stochastic Models for Non-Stationary Process

Page 249: Pattern Recognition in Speech and Language Processing

FIGURE 7.4Left-to-right hidden Markov model.

� �

� � ������� � ����� � ��� ���� � � �� ���� ��

� � � �� ������

������� �

segmental K-mean

���

Page 250: Pattern Recognition in Speech and Language Processing

7.2.4 Speech Segmentation

� � � �� ������

���� � ��� ����

7.2.5 Statistical Verification

� ��

������� � ��������� ������ � ��������� ������

������� � ��������� ������ � ��������� �������

��

������� ��������

� �

� ������

� ������

�����������������

������������������ ��

�� �

�������

������� ��

� ����

� ����� ���

� � ��������

���� �

��

�����������

��

�����������

�� ������

� ������ ���

Page 251: Pattern Recognition in Speech and Language Processing

Neymann-Pearson

���� � ���� ������� ���� �������

����� � ����� � �

7.3 Speaker Verification System

Page 252: Pattern Recognition in Speech and Language Processing

t )L(O,

L(O, )b

FIGURE 7.5A fixed-phrase speaker verification system.

whole-word or whole phrase model

�������� �� �

������� ��

��

���� �������

� ��

Page 253: Pattern Recognition in Speech and Language Processing

�� � ������

� � � ���� ���� ����

� � ���� ������� ��

� � �

������� ��

��

��

���

���� ��������

�� � ��������

� ���������

��� ������ � �������� ��������

������� �������

Page 254: Pattern Recognition in Speech and Language Processing

TABLE 7.1Experimental Results in Average Equal-Error Rates

7.4 Verbal Information Verification

Page 255: Pattern Recognition in Speech and Language Processing

FIGURE 7.6Utterance verification in verbal information verification (VIV).

7.4.1 Utterance Segmentation

�������� �� �

� � �� ���� ���

� ����� � ���������������

� ����� ����� ��

������

���� � � � � ����������

��� ��

� � ������� ������� � ����� � ���

����� ���� ���

��������

�� �� ���� �� � � ���

������

� �

��� � � � � � � � ���

7.4.2 Subword Hypothesis Testing

�� ��

Page 256: Pattern Recognition in Speech and Language Processing

�� �� �� ��

�� ��

����� �� �������

� ��������� �������

� ���������

�� ����� �� ��

��� ���� ��� � ����� � �� �

��

���� � ���� �������� ���� ��������

� ��

��

����� �������� ���� ��������

��

��

�� � ��� � ��

�� �

7.4.3 Confidence Measure Calculation

��� � ��� �� � � ��

� �

� ��

�����

����

� �

� ��

�����

� ��

�����

Page 257: Pattern Recognition in Speech and Language Processing

��

�� ��

�� ��

�� ����� �������� ���� ��������

���� �������

���� ������� �� � �� � �normalized confidence measure

� �

��

���

����

���� �

� �� � �

� � � � � �

� �

� �

� � ��

Page 258: Pattern Recognition in Speech and Language Processing

7.4.4 Sequential Utterance Verification

step-down procedure

���

���� � ����������� ����� � ���������� � ����

���� ����� ������

��

�� �

�����

������

� ������ �

�� �

�����

������

����� �

����� � � ������� � � ���

���� � ��� �

false rejectionfalse acceptance

�� �� equal-error rate �

�� � �� � �

Definition 1: False rejection error on � utterances � � ��

Definition 2: False acceptance error on � utterances � � ��

Definition 3: Equal-error rate on � utterances�

Page 259: Pattern Recognition in Speech and Language Processing

� ����� ������� � ����� � ����� �

����� ���� � � �������� ����� ���� �

� ��� ������

����� � ����� � ����� � ����� �

����� � ����� � ����� ������ �

����� � �

��

���

����� � ������ � �� �

� �

��

���

�� ������

����� � �

��

���

����� � ������ � �� �

��

���

������

��

��

�����

������ � ���� � � �

Page 260: Pattern Recognition in Speech and Language Processing

Example 1:

����� � ���� ����� � ������� � ���� ����� � �

����� � �������� � ���

���� � ��� ���� � �� ���� � ������� � ������

Example 2: ����� � ���� ����� � ������ � ����� �� � ���

� �

�� ���

� � ��

��

�� � �����

� � ����

�� �

���� � ����� � �������� � ����

7.4.5 VIV Experimental Results

� �

Page 261: Pattern Recognition in Speech and Language Processing

� � � ���� ��� �� � ��� � �� �

� ��� � �������� ���� � � �� ���� ��

���� �� � � �

� ���

TABLE 7.2Summary of the Experimental Results on Verbal Information Verification

Page 262: Pattern Recognition in Speech and Language Processing

FIGURE 7.7An integrated voice authentication system combining verbal information verifi-cation and speaker verification.

7.5 Speaker Authentication by Combining SV and VIV

Page 263: Pattern Recognition in Speech and Language Processing
Page 264: Pattern Recognition in Speech and Language Processing

TABLE 7.3Experimental Results without Adaptation in Average Equal-Error Rates

TABLE 7.4Experimental Results with Adaptation in Average Equal-Error Rates

Page 265: Pattern Recognition in Speech and Language Processing

7.6 Summary

Page 266: Pattern Recognition in Speech and Language Processing

References

An Introduction to Multivariate Statistical Analysis

Journal of the AcousticalSociety of America

Proceedingof the IEEE

Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing

Ann. Math. Stat.

Proceedings of the IEEE

Proceedings of the IEEE International Conferenceon Acoustics, Speech, and Signal Processing

Journal of Royal Statistical Society

Pattern Classification, Second Edition

Proceeding of IEEE

Introduction to Statistical Pattern Recognition

IEEETrans. Acoust., Speech, Signal Processing

Page 267: Pattern Recognition in Speech and Language Processing

IEEE SignalProcessing Magazine

AT&T Technical Journal

IEEE Trans. on Speech and Audio Process.

IEEE Transactions on Signal Processing

Proceed-ings of ICASSP

Proceedings of Int. Conf. on Spoken Language Processing

Proc.of ICSLP

IEEE Trans. on Speech and Audio Processing

Proceedings of the IEEE InternationalConference on Acoustics, Speech, and Signal Processing

IEEE Robotics & Automation magazine

Proceedings of EUROSPEECH

IEEE Trans. on Speech and Audio Processing

Proceedings of theIEEE International Conference on Acoustics, Speech, and Signal Processing

Page 268: Pattern Recognition in Speech and Language Processing

IEEE International Conference on Acoustics, Speech, and Signal Processing

Proceedings of IEEE Workshop on Automatic Identifi-cation

IEEE Trans. onSpeech and Audio Processing

Journal of theAcoustical Society of America

Proc. IEEE Int. Conf.Acoust., Speech, Signal Processing

Biometrika

Phil. Trans. Roy. Soc. A

IEEE Trans. onSpeech and Audio Processing

Proceedingsof ICSLP-96

Fundamentals of speech recognition

AT&T Technical Journal

Proc. IEEE Int. Conf. Acoust., Speech, SignalProcessing

Proc. IEEE Int. Conf. Acoustic, Speech, Signal Processing

Page 269: Pattern Recognition in Speech and Language Processing

IEEE Trans. on Speech and Audio Processing

Proceedings of theIEEE

Proceedings of the IEEE In-ternational Conference on Acoustics, Speech, and Signal Processing

Proc. IEEE Int. Conf. Acoust., Speech,Signal Processing

Proc. Int. Conf. on Spoken LanguageProcessing

Proc. IEEE Int. Conf. Acoust., Speech,Signal Processing

AT&T Technical Journal

IEEE Trans. Speech and Audio Process.

Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing

IEEE Transactions on Information Theory

Sequential analysis

IEEETrans. on Acoustics, Speech, and Signal Proc.

Page 270: Pattern Recognition in Speech and Language Processing

The Annalsof Statistics

Page 271: Pattern Recognition in Speech and Language Processing

8

HMMs for Language Processing Problems

Richard M. Schwartz and John MakhoulBBN Technologies, Verizon

CONTENTS

8.1 Introduction

Page 272: Pattern Recognition in Speech and Language Processing

8.2 Use of Probabilities

Page 273: Pattern Recognition in Speech and Language Processing

8.2.1 Hidden Markov Models

Page 274: Pattern Recognition in Speech and Language Processing

8.3 Name Spotting

Page 275: Pattern Recognition in Speech and Language Processing
Page 276: Pattern Recognition in Speech and Language Processing

8.4 Topic Classification

��

Page 277: Pattern Recognition in Speech and Language Processing

.

.

P(Tj|Set)

storystart

storyend

T1

T2

TM

T0G eneralLanguage

Loop

P(Set)

nP (W n|Tj)

FIGURE 8.1A hidden Markov model for topics. Each state can emit words for one topic.State T0 emits words corresponding to general language.

8.4.1 The Model

���

� ���� � ��

� ���� � �� � � ������ �� � ����

� ����

� ���

� ����� �

�� ��������

������������

� ���� ���

��

���

Set� ���� ���

� ��� � ��� � ������ ��

��

� �� � ����

Page 278: Pattern Recognition in Speech and Language Processing

� �� � � ����� ��� � ���

� �� � ����

� �� � ���� ��

�����

� ��� � ����� ��� � ���

8.4.2 Estimating HMM Parameters

� �� � �����

������� � ���� ��� �� � �

�� � �

������� � ���� � �� ����

��

���� ��������� � ���

��

���� �������� ��

�� ���� �

� �

����

����

����

��������� � �� � ��� � ����� �������� � �������� � ����

���� ���� � �������� � ���

��������� � �� � �

��� � ������

�� �� ��

������ � ��� �

�� ��������� � ���

���� ��������� � ��

Page 279: Pattern Recognition in Speech and Language Processing

8.4.3 Classification

���

��� �� �

���� ��� � �� � � ���� ���� ���

����

�� ��� � ���

� � �� � ���

� ���

��

���� � � ��

8.4.4 Experiments

Page 280: Pattern Recognition in Speech and Language Processing

8.5 Information Retrieval

8.5.1 A Bayesian Model for IR

� �� � � � ��

� �� � � � �� �� �� � ��� �� � � � ��

� ���

� �� � ��

� ���

� �� � � � ��

Page 281: Pattern Recognition in Speech and Language Processing

8.5.2 Training the IR HMM

� �

8.5.3 Performance

Page 282: Pattern Recognition in Speech and Language Processing

8.6 Event Tracking

Page 283: Pattern Recognition in Speech and Language Processing

8.7 Unsupervised Topic Detection

Page 284: Pattern Recognition in Speech and Language Processing

������ � ������������

-

Page 285: Pattern Recognition in Speech and Language Processing

8.8 Summary

References

Page 286: Pattern Recognition in Speech and Language Processing
Page 287: Pattern Recognition in Speech and Language Processing

9

Statistical Language Models With EmbeddedLatent Semantic Knowledge

Jerome R. BellegardaApple Computer, Inc.

CONTENTS

9.1 Introduction

9.1.1 Scope Locality

Page 288: Pattern Recognition in Speech and Language Processing

stocks fell sharply as a result of the announcementstocks, as a result of the announcement, sharply fell

fell stocks� � �

� � �

information aggregation span extension

9.1.2 Syntactically–Driven Span Extension

��� �� headwords ��� ��words

stocks fell

Page 289: Pattern Recognition in Speech and Language Processing

9.1.3 Semantically–Driven Span Extension

document

stock market trends

stocksfell stocksfell

stocks fell

latent semantic analysis

9.1.4 Organization

Page 290: Pattern Recognition in Speech and Language Processing

9.2 Latent Semantic Analysis

� ��� � � ��

� � ��� � ��� � �� �

��� �

9.2.1 Feature Extraction

� ���

���� � ��� ��� ���

��

Page 291: Pattern Recognition in Speech and Language Processing

���� �� �� ���� �� �� �

�� ����

�� ��

� ���� �� �

��

�� � ��

����

��

���

������

���������

� � �� � � ���� � �� ���� � �����

� �� � � �

� �

9.2.2 Singular Value Decomposition

�� �� �

� � � �

�� ��

��

� � �� � � � � �

�� � � � � � � �� � � �� � �� � � � � � �� � � �

� � � �� � � � � � � � ������

�� � �

� � � � �� � ��� �

� � � ���

� � ���� � � � � �

� � ��

�� � � � � �

Page 292: Pattern Recognition in Speech and Language Processing

� ��� � ��� word vector ��

� �� � � ��� document vector��

� �

� � � �

��

� �

9.2.3 General Behavior

� � ��

Page 293: Pattern Recognition in Speech and Language Processing

FIGURE 9.1Improved Topic Separability in LSA Space.

9.3 LSA Feature Space

� � � � �

� �� � � ��� �� �

� � �� � � ���

Page 294: Pattern Recognition in Speech and Language Processing

� � �

� � �

9.3.1 Word Clustering

� � �

� ��

� � � � � � � � � �

� �� ���

����� ��� � �������� ���� ��� �

� ���

����� ������

� � �� ����� ��� � ��� �� ��� �

��� �

����� ��� �������� ��� �

��� �

�� � � � �

9.3.2 Word Cluster Example

� � � ���

� � �� ���

drawingrule polysemy draw-

ing rule

Page 295: Pattern Recognition in Speech and Language Processing

Cluster 1

Cluster 2

FIGURE 9.2Word Cluster Example (After [2]).

drawing a conclusion breaking a rule

hysteria here

9.3.3 Document Clustering

� � � � � � � � � �

� � �� � � �

���� �� � ������� ��� �� �

� ��

���� �����

� �� � � � �

����� ���

Page 296: Pattern Recognition in Speech and Language Processing

FIGURE 9.3Document Cluster Example.

9.3.4 Document Cluster Example

Page 297: Pattern Recognition in Speech and Language Processing

9.4 Semantic Classification

9.4.1 Framework Extension

���� � �

� � � ����

� �

��� � � � �� �

� � �� �

��� � �� � ���� � �

����

���pseudo document vector

����

Page 298: Pattern Recognition in Speech and Language Processing

����

� ����

��� �� �� ���� ���

9.4.2 Semantic Inference

��

semantic anchor

semantic inference

� � �what is the time what is the day

what time is the meeting cancel the meeting� � � what is

the� � �

day can-celthe

day cancel

what–is time time meetingtime

when is themeeting what time is the meeting

���

������ �

Page 299: Pattern Recognition in Speech and Language Processing

FIGURE 9.4An Example of Semantic Inference for Command and Control (� � �).

9.4.3 Caveats

not

Page 300: Pattern Recognition in Speech and Language Processing

change popup to windowchange window to popup

exact same point

� � �

� �

9.5 N-gram+LSA Language Modeling

9.5.1 LSA Component

�� � ���

����� � ���

��� �������� � ��� � ������ �

��� � ������ � ������

��

Page 301: Pattern Recognition in Speech and Language Processing

9.5.1.1

�����

������ � ����� � � ������ � �

�� � ����� ����

� � � � � �

���

��� ��� � �

������� �

�� �

���� � � � � � � � ��� �

���� � ��� � ��

��

��� � � ������ � �� � �

��

9.5.1.2

� � ��� �

� � �� ��

��� ������

��� � ����� � �� ������ ������

��� � � � �� �

���

� ������ ������������

� �� � � ����� � ������ �� ��� � ����� � �

Page 302: Pattern Recognition in Speech and Language Processing

���� �

��� � ������

�����

��� � ������ ��

����������

the�

9.5.2 Integration with N-grams

��� ���������� � � ��� ��

������� �

������� �

���� � ���� ��� ����� � �������� � � �

������ � � � �����

Page 303: Pattern Recognition in Speech and Language Processing

��� ���������� � �

��� � ���������

��������

����

���� ���������

�������

��� � ���������

������� �

��� ��������� � ��

��������� � �

������� �

��� ��������� � � � �������

� � �������� �������� � � � ������� �

��� ���������� � �

��� ��������� � � � ������� � ����������

����

������������ � � � ������� � ����������

� ����������

� ��������� ��� � ������

� ������

��� ���������� � �

��� ��������� � � � ���������� � ������

�����

����

������������ � � � ����������� ������

������

Page 304: Pattern Recognition in Speech and Language Processing

����� � � �

9.5.3 Context Scope Selection

� ���������

����� �

� � � � �

���� ��

��

�� ��� � �� ������ � ��� ����

��� � � ������

��� ���

�� � � � � �

Page 305: Pattern Recognition in Speech and Language Processing

9.6 Smoothing

9.6.1 Word Smoothing

�� � � � � �

��� � ������ �

��

���

��� ���� ���� ������ �

��� � ������� �

�� ��� ������

� �

������� � � � �� � � � � �

�� ��

������� � ��

� � �

�� ��� ����

���� ������ �����

�����

Page 306: Pattern Recognition in Speech and Language Processing

9.6.2 Document Smoothing

�� � � � � �

��� � ������ �

��

���

��� ���� ���� ������ �

�� ��

��� ����

�� �� ����������� ��

�� � � � � � � � �

�����

� � �

� � �

��� ��������� � �

��� ��������� ���������� ��������

����

������������ ��������������

������

���� ��������

� ������

Page 307: Pattern Recognition in Speech and Language Processing

� �

��� ����

���� �����������

9.6.3 Joint Smoothing

��� � ������ �

��

���

��

���

��� ���� ��� ��� � ��� ������ �

��� � ������ �

��

���

��

���

��� ���� ������� ���� ������ �

�� ��

��� ���� ���� ������ �������

9.7 Experiments

��

Page 308: Pattern Recognition in Speech and Language Processing

9.7.1 Experimental Conditions

� � � ��� ���

� � ��� ���

� � �������

� � ���

��� �

� � ��� � � �

9.7.2 Experimental Results

Page 309: Pattern Recognition in Speech and Language Processing

TABLE 9.1

� � � � � � � �

� � � �

� � � �

� � � �

� � � �

� �

9.7.3 Context Scope Selection

� � ����� �

� � � � � � ����

���� � � �

Page 310: Pattern Recognition in Speech and Language Processing

TABLE 9.2

� �

� � ��� � �

� � ���� � �

� � ���� � �

� � ����� � �

� � ���� � �

� � ���� � �

� � ���� � �

� � � � � ����

� � ����

9.8 Inherent Trade-Offs

9.8.1 Cross-Domain Training

�� �� � �� ���

� �

�� � ���� ���

�� �� � � ���

Page 311: Pattern Recognition in Speech and Language Processing

TABLE 9.3

� �

�� �� � ��� ��� � �

�� �� � ���� ��� � �

�� �� � ���� ��� � �

9.8.2 Discussion

�� ��

Page 312: Pattern Recognition in Speech and Language Processing

� �

9.9 Conclusion

Page 313: Pattern Recognition in Speech and Language Processing

References

Context-Dependent Vector Clustering for Speech Recognition

A Multi-Span Language Modeling Framework for Large Vo-cabulary Speech Recognition

Large Vocabulary Speech Recognition With Multi-Span Sta-tistical Language Models

Exploiting Latent Semantic Information in Statistical Lan-guage Modeling

Robustness in Statistical Language Modeling: Review andPerspectives

Fast Update of Latent Semantic Spaces Using a Linear Trans-form Framework

ANovel Word Clustering Algorithm Based on Latent Semantic Analysis

Toward Unconstrained Command andControl: Data-Driven Semantic Inference

Natural Language Spoken InterfaceControl Using Data-Driven Semantic Inference

Page 314: Pattern Recognition in Speech and Language Processing

Large–Scale Sparse Singular Value Computations

Using Linear Algebra for In-telligent Information retrieval

An Overview of Parallel Algorithms for the SingularValue and Dense Symmetric Eigenvalue Problems

Natural Language Call Routing: A Robust,Self–Organized Approach

Structure and Perfor-mance of a Dependency Language Model

Recognition Performance of a Structured LanguageModel

Building Probabilistic Models for Natural Language

Dialog Management in Vector–Based CallRouting

Language Model Adaptation Using Mix-tures and an Exponentially Decaying Cache

Towards Better Integration of Semantic Predictorsin Statistical Language Modeling

Lanczos Algorithms for Large SymmetricEigenvalue Computations – Vol. 1 Theory

Recognizing and Using Knowledge Structures in Dialog Sys-tems

Indexing by Latent Semantic Analysis

Page 315: Pattern Recognition in Speech and Language Processing

Adaptive Lan-guage Model Estimation Using Minimum Discrimination Estimation

Improving the Retrieval of Information from External Sources

Latent Semantic Indexing (LSI) and TREC–2

Language Modeling

Personalized Information Delivery: An Analysisof Information Filtering Methods

On Topic Identification and Dialogue Move Recognition

Topic–Based Language Modeling Using EM

Matrix Computations

Document Space Models Using Latent Semantic Anal-ysis

Probabilistic Latent Semantic Analysis

Probabilistic Topic Maps: Navigating Through Large Text Col-lections

Modeling Long Distance Dependencies in Lan-guage: Topic Mixtures Versus Dynamic Cache Models

Self–Organized Language Modeling for Speech Recognition

Page 316: Pattern Recognition in Speech and Language Processing

Putting Language into Language Modeling

Using a Stochastic Context–Free Grammar as a Language Model forSpeech Recognition

Putting Language Back into Language Modeling

Statistical Language Modeling Using a Variable Context

The Hub and Spoke Paradigm for CSR Evaluation

A Cache-based Natural Language Method for SpeechRecognition

Cluster Expansion and Iterative Scaling for Maxi-mum Entropy Language Models

Solution to Plato’s Problem: The LatentSemantic Analysis Theory of Acquisition, Induction, and Representation ofKnowledge

How Well Can Pas-sage Meaning Be Derived Without Using Word Order: A Comparison of LatentSemantic Analysis and Humans

Trigger–Based Language Models: AMaximum Entropy Approach

On Structuring Probabilistic Dependencesin Stochastic Language Modeling

Page 317: Pattern Recognition in Speech and Language Processing

A Variable–Length Category–Based N–Gram Lan-guage Model

Latent Seman-tic Indexing: A Probabilistic Analysis

Beyond Word �-Grams

An Overview of Automatic SpeechRecognition

The CMU Statistical Language Modeling Toolkit and its Use inthe 1994 ARPA CSR Evaluation

A Maximum Entropy Approach to Adaptive Statistical LanguageModeling

Two Decades of Statistical Language Modeling: Where Do WeGo From Here

Interactive Feature Induc-tion and Logistic Regression for Whole Sentence Exponential Language Mod-els

Language Representation

A MaximumLikelihood Model for Topic Classification of Broadcast News

An Explanation of the Effectiveness of Latent Semantic Indexing byMeans of a Bayesian Regression Model

Combining Nonlocal, Syntactic and N-Gram De-pendencies in Language Modeling

Page 318: Pattern Recognition in Speech and Language Processing

Recognition and Parsing of Context–Free Languages in Time�

Using Detailed Linguistic Structure inLanguage Modeling

Linguistic Features for Whole Sen-tence Maximum Entropy Language Models

Integration of Speech Recognition and Natural Language Processing in theMIT Voyager System

Page 319: Pattern Recognition in Speech and Language Processing

10

Semantic Information Processing of SpokenLanguage – How May I Help You?sm

A. L. Gorin, A. Abella, T. Alonso, G. Riccardi, and J. H. Wright,AT&T Laboratories

CONTENTS

10.1 Introduction

AT&T’s‘How May I Help You?’ ��

“The fundamental problem of communication is that of reproducing atone point either exactly or approximately a message selected at another

Page 320: Pattern Recognition in Speech and Language Processing

point. Frequently the messages have meaning, � � � These semantic as-pects of communication are irrelevant to the engineering problem.”

confirmclarify

“Do you want to make a collect call?”“Charge this call please”

“How do you want to charge this call, to a credit card or to a third num-ber?”

“What is your home phonenumber?”

Page 321: Pattern Recognition in Speech and Language Processing

Construct Algebra

dialog motivators

inheritance hierarchy

‘is a’‘has a’

10.2 Call-Classification

‘press one if you want x, press two if you want y’

Page 322: Pattern Recognition in Speech and Language Processing

‘please say collect, calling card’ ‘press orsay one if you want x’

‘How may I help you?’

“I want to reverse the charges on this call.”“Can you tell me what time it is in Tokyo?”“I was trying to call my sister and dialed a wrong number.”“I’ve been trying to dial this number all day and can’t get through.”

“How much money do I owe you?”“I don’t recognize this phone call to Tallahassee on October 4.”“What’s this charge for one dollar and fifty cents?”“I have a question about my bill.”

Page 323: Pattern Recognition in Speech and Language Processing

FIGURE 10.1Call classification and routing in HMIHY.

‘How may I help you?’

‘How may I help you?’

Page 324: Pattern Recognition in Speech and Language Processing

FIGURE 10.2Inheritance hierarchy of task knowledge in operator services.

perplexity

� � � ���

� �

Evaluating Call Classification.

Page 325: Pattern Recognition in Speech and Language Processing

FIGURE 10.3Histogram of utterance lengths.

false rejection

correct classificationtrue rejection rate

Remark:

Page 326: Pattern Recognition in Speech and Language Processing

“I want to know howto pay my bill”

10.3 Language Modeling for Recognition and Understanding

� � ���� � � � ��

�� � �� ������ � � � �����

‘I want to make a’‘collect call’ ‘card call’

� � � �

‘wrong’‘wrong number’

‘dialed a wrong number’

‘dialed a wrong number’ ‘dialed the wrong number’

Page 327: Pattern Recognition in Speech and Language Processing

FIGURE 10.4A salient grammar fragment.

salient grammar fragments

� User

� ��� yeah I’m not AT&T WIRELESS PHONE and when I got and she toldme that I would be switched to 7 CENTS A MINUTES FOR ALL my AT&Tlong distance on that I was on 10 10 cents ONE RATE PLAN

� SLU

Page 328: Pattern Recognition in Speech and Language Processing

FIGURE 10.5Natural spoken dialog in HMIHY.

10.4 Dialog

Page 329: Pattern Recognition in Speech and Language Processing

Machine:User:Machine:User:Machine:User:Machine:User:Machine:

Machine:User:Machine:User:Machine:

User:Machine:

10.5 Conclusions

www.research.att.com/�algor/hmihy

References

Page 330: Pattern Recognition in Speech and Language Processing
Page 331: Pattern Recognition in Speech and Language Processing

11

Machine Translation Using StatisticalModeling

Herman Ney, and F. J. OchAachen University of Technology, Germany

CONTENTS

Abstract.

11.1 Introduction

machine translationwritten language text

spoken speech

Page 332: Pattern Recognition in Speech and Language Processing

spontaneous speech

� Statistical Decision Theory and Linguistics.

� Alignment and Lexicon Models.

� Alignment Templates: From Single Word to Word Groups.

� Experimental Resultswritten spoken

� Speech Translation: The Integrated Approach.serial

integrated

Page 333: Pattern Recognition in Speech and Language Processing

11.2 Statistical Decision Theory and Linguistics

11.2.1 The Statistical Approach

Page 334: Pattern Recognition in Speech and Language Processing

11.2.2 Bayes Decision Rule for Written Language Translation

� ��

�������� ����� ��

�� ������������

� wordfull-form

����� ������

���

������������

� ���������

������� � �����

������

������ ���� �

�����

after

11.2.3 Related Approaches

Page 335: Pattern Recognition in Speech and Language Processing

Source Language Text

Transformation

Lexicon Model

Language Model

Global Search:

Target Language Text

over

maximize Alignment Model

Transformation

FIGURE 11.1Architecture of the translation approach based on Bayes decision rule.

11.3 Alignment and Lexicon Models

11.3.1 Concept of Alignment Modelling

���� ��

�����

�� � � ��� �� ��� ��

��

Page 336: Pattern Recognition in Speech and Language Processing

well

I

think

if

we

can

make

it

at

eight

on

both

days

ja

ich

denke

wenn

wir

das

hinkriegen

an

beiden

Tagen

acht

Uhr

FIGURE 11.2Example of an alignment for a German-English sentence pair.

exactlyonealignment models

11.3.2 Hidden Markov Models

Page 337: Pattern Recognition in Speech and Language Processing

� � � � ��

�� � �� � � ��

� ��

�� ������� ������� �

�� ��

��

����������� ��

���

������ ��

������

‘hidden’not

sequence �� �� � � � �

� � �� � �

���� �� ��

������

������ ��

������ �

� ��� ����� � �����

� ��

����� �

� ��� ����� �

�����

������ �� ��

����

�����

��� ��

� ��� ����� �

�����

������ ��

����

�����

��� � � ����� ��

����

��� ��� ��

��� ����� ����� ��

����

�����

��� �

����� ������

��� ��� �

����� ������

�����

��� �

� baseline HMM

� �

���� � �������

����� ������

�����

��� � �� ���� ����� � �

Page 338: Pattern Recognition in Speech and Language Processing

����� ����� � � �� ���

����� ������

��� ��

� �� �� ��� ���� �

���� ���� � �

���� ����� �� �� ���

� homogeneous HMM

��� ����� � �� ��� ������ � � �� �� � ����

����� �����

� �

���

� ��

� �� � ��� ����� � ��

������ � �������

���

���� �����

����

�� � �� �� � �����

� context dependent HMM

����� ������

�����

��� �� �� ��� ����� � � � �����

���� � � ��

Page 339: Pattern Recognition in Speech and Language Processing

����� ������

� ���� ��

�� �� �� ���� ���� � ����� ������

� ��� �����

11.3.3 Models IBM 1–5

before

� models IBM-1 and IBM-2: zero-oder dependence.first-order zero-order

absolute

����� ������

� �����

� ���� �� �� ���� �� � ��

������

����� � ��� �� �

���

��

���

����� �� � �� � ���� ���� ��

� ���

� ��� �� �

��

���

��

���

������ � �� � ���� �����

����� � ������ ����

����� � �� ��

���

� ��

�� � � ���

��

Page 340: Pattern Recognition in Speech and Language Processing

� model IBM-3: fertility concept.

������

���

�� ��� �� �� � �

�� ����

��� � ��

�� �� � �

� models IBM-4 and IBM-5: inverted alignments with first-order depen-dence.

������ ��

������

�� � �� � � �� � ���

� ��

���

���

� � �� � �

� �

� � � � �

���� �� � �� � �

�� �� ���� ��� � ����

�� � � � ����

Page 341: Pattern Recognition in Speech and Language Processing

� � �

� ��� absoluterelative

�� � �� � ���� �

��� ������� ���

�������

���������� ������� �������

�����

� � ��

������

���

�� ������

� ���

�����

��

���� ������

��� ���

word context

11.3.4 Training

Page 342: Pattern Recognition in Speech and Language Processing

exact allmaximum approximation

11.3.5 Search

� ��

� � ��

� invertedtarget source

� several�

� � �� �� � ��� ���� �� ���� �

Page 343: Pattern Recognition in Speech and Language Processing

SENTENCE INSOURCE LANGUAGE

TRANSFORMATION

SENTENCE GENERATEDIN TARGET LANGUAGE

SENTENCE

KNOWLEDGE SOURCESSEARCH: INTERACTION OF

KNOWLEDGE SOURCES

WORD + POSITION

ALIGNMENT

LANGUAGEMODEL

BILINGUALLEXICON

ALIGNMENTMODEL

WORD RE-ORDERING

SYNTACTIC ANDSEMANTIC ANALYSIS

LEXICAL CHOICE

HYPOTHESES

HYPOTHESES

HYPOTHESES

TRANSFORMATION

FIGURE 11.3Illustration of search in statistical translation.

sets � �

���������� �����

� �

���������� �����

����

������ ��� � ���

��

�����

�����

��������

������� � ���������� ����� �

�����

���� ����

���

Page 344: Pattern Recognition in Speech and Language Processing

FIGURE 11.4Illustration of bottom-to-top search.

bottom-to-top �

��

�all

once

11.3.6 Algorithmic Differences between Speech Recognition and Lan-guage Translation

Page 345: Pattern Recognition in Speech and Language Processing

11.4 Alignment Templates: From Single Words to Word Groups

11.4.1 Concept

alignment template

Page 346: Pattern Recognition in Speech and Language Processing

okay

,

how

about

the

nineteenth

at

maybe

,

two

o’clock

in

the

afternoon

?

okay ,

wie

sieht

es

am

neunzehnten

aus ,

vielleicht

um

zwei

Uhr

nachmittags ?

FIGURE 11.5Example of alignment templates for a German-English sentence pair.

� ��

���

���� ���

�� ��� � �������� ���� ��� � � � �� �����

��� � ���� � ��� � �������� ���� ��� � � � �� �����

withinbetween ���

� between

Page 347: Pattern Recognition in Speech and Language Processing

����

����

������

����� � ��� ���

�������

��

����

�������� ����

������

��

����

�������

������ � ��� ���

������� �����

��

����

��

���

�������������� � �� ����������

��������������within

�� � ����

��� � ���

�� �� ���� ��

������ � �� �� �� ���

� � ��� � �� � �

������ �� �� �� ���� ��� ��� �

�� �� �� ���

�� �� �� ��� �

���

���

���

���

���� � � � �� ��� �����

���� � � ����

�����

11.4.2 Training

each � �

Page 348: Pattern Recognition in Speech and Language Processing

���������� ���

11.4.3 Search

� �

between the word groups within

11.5 Experimental Results

11.5.1 The Task and the Corpus

Page 349: Pattern Recognition in Speech and Language Processing

before

don’t � do not

11.5.2 Offline Results

Page 350: Pattern Recognition in Speech and Language Processing

several

TABLE 11.1

Page 351: Pattern Recognition in Speech and Language Processing

TABLE 11.2

� �

11.5.3 Integration into the Prototype System

stattrans

stattransrepair

stat-trans

prosodyprosody

11.5.4 Final Evaluation

Page 352: Pattern Recognition in Speech and Language Processing

slot filling

and

relative

TABLE 11.3

Page 353: Pattern Recognition in Speech and Language Processing

11.6 Speech Translation: The Integrated Approach

11.6.1 Principle

���� ������������

� � ����� � ��

���

���� ��

�� ��

� ��

� ��

Page 354: Pattern Recognition in Speech and Language Processing

���������

���������

�� �

� ���������

������

�� � �����

�������

� ���������

��������

�� �����

������� ��

������

��

� ���������

��������

�� �����

����������� � �����

������ ��

��

��

� ���������

��������

�� �����

����������� � �����

������

��

�� ���������

�����

�� ����

���

������

������ � �����

�������

����������� ��

�� � �����

������ �

���

if � ��

���

������� ���� �

������

���� ���

11.6.2 Practical Implementation

� ��

���� ������ ��� � ���� ���� �

����������� �

����

��

���� ������ � � ���� ������ ��� �

Page 355: Pattern Recognition in Speech and Language Processing

Speech Input inSource Language

Translated Text inTarget Language

Acoustic Model

Lexicon Model

Alignment Model

Language Model

AcousticAnalysis

Global Search:

maximize

over

FIGURE 11.6Integrated architecture of speech translation approach based on Bayes decisionrule.

� ��

���

joint ���� ��� ��

�� � �����

�� � �����

������

���� ���

�������

Page 356: Pattern Recognition in Speech and Language Processing

���� ��� ��

��

���� ��� ��

��

� meaning

se-mantically �����

��

sourcetarget

11.7 Summary

Acknowledgment

Page 357: Pattern Recognition in Speech and Language Processing

11.8 References

Spoken Language Translation Workshop, 35th Annual Conf. of the As-soc. for Computational Linguistics

Computational Linguistics

Int. Conf. on Spoken Language Processing,

ARPA Human Language Technology Workshop

United States Patent

Computational Linguistics

Page 358: Pattern Recognition in Speech and Language Processing

Computer Speech and Language

ComputationalLinguistics

� Computational Linguistics

IEEE Automatic Speech Recognition and Understanding Workshop

Words and objections. Essays on the work of W. V. Quine

Workshop on Very Large Corpora

Int. Conf. on Spoken Language Processing,

Final report of the EuTrans project

39thAnnual Meeting of the Assoc. for Computational Linguistics,

39th Annual Meeting of the Assoc.for Computational Linguistics,

Statistical methods for speech recognition.

Page 359: Pattern Recognition in Speech and Language Processing

Europ. Conf. on Speech Communication and Technology,

Computational Linguistics

2nd Conf.of the Assoc. for Machine Translation in the Americas

IEEE Int.Conf. on Acoustics, Speech and Signal Processing,

IEEE Trans. on Speech and AudioProcessing

18th Int. Conf. on Computational Linguistics

2nd Int. Conf. on Language Resourcesand Evaluation

36th Annual Meeting of the Assoc. for Compu-tational Linguistics and 17th Int. Conf. on Computational Linguistics

9th Conf.of the Europ. Chapter of the Assoc. for Computational Linguistics

18th Int. Conf. on Computational Linguistics

38th Annual Meet-ing of the Assoc. for Computational Linguistics

Joint SIGDAT Conf. on Empirical Methods in Natural Lan-guage Processing and Very Large Corpora

Page 360: Pattern Recognition in Speech and Language Processing

Data-Driven Machine Translation Workshop, 39thAnnual Meeting of the Assoc. for Computational Linguistics

IBM Research Report

Fundamentals of speech recognition

Data-Driven MachineTranslation Workshop, 39th Annual Meeting of the Assoc. for ComputationalLinguistics

6th Int. Workshop on Parsing Technologies

Data-Driven Machine Translation Workshop, 39th AnnualMeeting of the Assoc. for Computational Linguistics

18th Int. Conf. on Computational Linguistics 2000

IEEE Int. Conf. onAcoustics, Speech and Signal Processing

38th An-nual Meeting of the Assoc. for Computational Linguistics

Page 361: Pattern Recognition in Speech and Language Processing

16th Int. Conf. on Computational Linguistics

Verbmobil: Foundations of speech-to-speech translation.

35th An-nual Conf. of the Assoc. for Computational Linguistics

IEEE Trans. on Speech and AudioProcessing,

Computational Linguistics

39thAnnual Meeting of the Assoc. for Computational Linguistics,

Page 362: Pattern Recognition in Speech and Language Processing

12

Modeling Topics for Detection and Tracking

James AllanUniversity of Massachusetts Amherst

CONTENTS

12.1 Topic Detection and Tracking

Page 363: Pattern Recognition in Speech and Language Processing

12.1.1 Topic and Events

event

topic

not

12.1.2 TDT Tasks

Page 364: Pattern Recognition in Speech and Language Processing

12.1.2.1 Segmentation

12.1.2.2 Cluster Detection

12.1.2.3 Tracking

12.1.2.4 New Event Detection

12.1.2.5 Link Detection

Page 365: Pattern Recognition in Speech and Language Processing

12.1.3 Corpora

each

Page 366: Pattern Recognition in Speech and Language Processing

12.1.4 Evaluation

� � � � �� � � � � � � �� � �

� � � ��

� � �� � � � � �� � �� � � � � �� � � � � ����

�� � ����

Page 367: Pattern Recognition in Speech and Language Processing

0.02 0.10.2 0.5 1 2 5 10 20 40 60 80 90False Alarm Rate

2

5

10

20

40

60

80

90

Miss

Rate

0.02 0.10.2 0.5 1 2 5 10 20 40 60 80

1

2

5

10

20

40

60

80

FIGURE 12.1A sample detection error tradeoff (DET) curve for the TDT tracking task withone training story (�� � �).

minimum

Page 368: Pattern Recognition in Speech and Language Processing

12.2 Basic Topic Models

12.2.1 Vector Space

�� � ��

������� � �������

Page 369: Pattern Recognition in Speech and Language Processing

12.2.2 Language Models

� ��� ��

� � �

� � ���

� � ��� ��

��

� �����

���� � ��� � ���� � ���

12.3 Implementing the Models

Page 370: Pattern Recognition in Speech and Language Processing

12.3.1 Named Entities

President Bush George Bush

12.3.2 Document Expansion

Page 371: Pattern Recognition in Speech and Language Processing

� ����� ��

���

� ���� �����

12.3.3 Clustering

Page 372: Pattern Recognition in Speech and Language Processing

12.3.4 Time Decay

12.4 Comparing Models

12.4.1 Nearest Neighbors

Page 373: Pattern Recognition in Speech and Language Processing

� �

� � �

12.4.2 Decision Trees

Page 374: Pattern Recognition in Speech and Language Processing

12.4.3 Model-to-Model

��� � � ������ � ��

� �

��� � �� ��

���

���� �������

����

� � �

��� � ���� ����� ����

���� � ��� ����� � ���

12.5 Miscellaneous Issues

Page 375: Pattern Recognition in Speech and Language Processing

12.5.1 Deferral

12.5.2 Multi-modal Issues

Page 376: Pattern Recognition in Speech and Language Processing

third

12.5.3 Multi-lingual Issues

Page 377: Pattern Recognition in Speech and Language Processing

FIGURE 12.2Screen snapshot of the Lighthouse system that was created to portray TDT topicclusters and their relationships.

12.6 Using TDT Interactively

12.6.1 Demonstrations

Page 378: Pattern Recognition in Speech and Language Processing

12.6.2 Timelines

��

Oklahoma

��

OklahomaMcVeigh Simpson

Page 379: Pattern Recognition in Speech and Language Processing

FIGURE 12.3Overview of January-June 1998. The topic labeled monica lewinsky allegation isthe highest ranked topic by the �� measure. The pop-up on oregon school shoot-ing shows significant named entities for that event. The other pop-up displays asub-menu for obtaining more information on the name kip kinkel.

��

12.7 Modeling Events

Page 380: Pattern Recognition in Speech and Language Processing

12.8 Conclusion

� research

Page 381: Pattern Recognition in Speech and Language Processing

References

Proceedings of Conference onInformation Retrieval Research (SIGIR)

Proceedings of the DARPA BroadcastNews Transcription and Understanding Workshop

Proceedings of Conference on Information Retrieval Research (SIGIR)

Information Retrieval

Topic Detection and Track-ing: Event-based Information Organization

In Proceedings of the 36th Annual Meetingof the Association for Computational Linguistics and the 17th InternationalConference on Computational Linguistics (COLING-ACL’98)

Proceedings for Empirical Methods in NLP

Page 382: Pattern Recognition in Speech and Language Processing

Proceedings of the Text Retrieval Conference(TREC-3)

Proceedings of the DARPA Broadcast News Workshop

Topic Detection and Tracking: Event-based InformationOrganization

Topic Detection and Tracking: Event-based Information Organization

Proceed-ings of the DELOS-NSF Workshop on Personalization and Recommender Sys-tems in Digital Libraries

Topic Detectionand Tracking: Event-based Information Organization

Topic Detection and Tracking: Event-based Information Organization

Proceedings of the Text Retrieval Conference (TREC-2)

Topic Detection and Tracking:Event-based Information Organization

Proceedings of the Human Language Technology Conference (HLT)

Proceedings of the Text RetrievalConference (TREC-8)

Page 383: Pattern Recognition in Speech and Language Processing

Proceedings of ACM SIGIR Conference on Research in Information Retrieval

Topic Detection andTracking: Event-based Information Organization

Proceedings of the IEEE Symposium on Information Visualization2000 (InfoVis 2000)

Foundations of Statistical Natural LanguageProcessing

EuroSpeech

Proceedings of the DARPABroadcast News Workshop

Proceedings of the 2000 Speech Transcription Workshop

Proceedings of the DARPA BroadcastNews Workshop

On-line New Event Detection, Clustering, and Tracking

Advances inInformation Retrieval: Recent Research from the CIIR

Page 384: Pattern Recognition in Speech and Language Processing

Proceedings of the DARPA Broadcast NewsWorkshop

Proceedings of SIGIR

A Language Modeling Approach to Information Retrieval

Proceedings ofthe European Conference on Research and Advanced Technology for DigitalLibraries (ECDL)

Proceedings of the Text Retrieval Conference (TREC-9)

Introduction to Modern InformationRetrieval

Topic Detectionand Tracking: Event-based Information Organization

Proceedings of the DARPA Broadcast NewsWorkshop

Proceedings of the Eighth International Conference on Informa-tion and Knowledge Management (CIKM99)

Proceedings of SIGIR

Proceedings of KDD 2000 Conference

Information Retrieval

Proceedings of the Text Retrieval Conference (TREC-8)

Proceedings of the DARPA Broadcast News Transcriptionand Understanding Workshop

Page 385: Pattern Recognition in Speech and Language Processing

ACM Transactions on Information Systems(TOIS)

Topic Detection and Tracking: Event-based Information Organization