algorithms for speech recognition and language processing
TRANSCRIPT
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
1/189
c m p - l g / 9 6 0 8 0 1 8 v 2 1 7 S e p 1 9 9 6
Algorithms for Speech Recognition andLanguage Processing
Mehryar Mohri Michael Riley Richard Sproat
AT&T Laboratories AT&T Laboratories Bell Laboratories
[email protected] [email protected] [email protected]
Joint work with Emerald Chung, Donald Hindle, Andrej Ljolje, Fernando Pereira
Tutorial presented at COLING96, August 3rd, 1996 .
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
2/189
Introduction (1)
Text and speech processing: hard problems
Theory of automata
Appropriate level of abstraction
Well-dened algorithmic problems
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing Introduction 2
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
3/189
Introduction (2)
Three Sections:
Algorithms for text and speech processing (2h)
Speech recognition (2h)
Finite-state methods for language processing (2h)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing Introduction 3
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
4/189
PART IAlgorithms for Text and Speech Processing
Mehryar MohriAT&T Laboratories
August 3rd, 1996
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 4
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
5/189
Denitions: nite automata (1) A = ( ; Q ; ; I ; F )
Alphabet ,
Finite set of states Q ,
Transition function : Q ! 2 Q ,
I Q
set of initial states, F Q set of nal states.
A recognizes L ( A ) = f w 2 : ( I ; w ) \ F 6 = ; g
(Hopcroft and Ullman, 1979; Perrin, 1990)Theorem 1 (Kleene, 1965). A set is regular (or rational) iff it can berecognized by a nite automaton.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 5
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
6/189
Denitions: nite automata (2)
0
ab
1a 2b 3a
0
b1a
a
2
b
b
3a
a
b
Figure 1: L ( A ) = a b a .
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 6
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
7/189
Denitions: weighted automata (1)
A = ( ; Q ; ; ; ; ; I ; F )
( ; Q ; ; I ; F ) is an automaton,
Initial output function ,
Output function
: Q
Q ! K
, Final output function ,
Function f : ! ( K ; + ; ) associated with A : 8 u 2 D o m ( f ) ; f ( u ) =
X
( i ; q ) 2 I ( ( i ; u ) \ F )
( ( i ) ( i ; u ; q ) ( q ) ) .
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 7
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
8/189
Denitions: weighted automata (2)
0 /4 1 /0a/0
3 /0a/2
2 /0
b/1
b/0a/0
Figure 2: Index of t = a b a .
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 8
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
9/189
Denitions: rational power series
Power series : functions mapping to a semiring ( K ; + ; )
Notation: S =X
w 2
( S ; w ) w , ( S ; w ) : coefcients
Support: s u p p ( S ) = f w 2
: ( S ; w ) 6 = 0 g
Sum: ( S + T ; w ) = ( S ; w ) + ( T ; w )
Star: S =X
n 0
S
n
Product: ( S T ; w ) =X
u v = w 2
( S ; u ) ( T ; v )
Rational power series : closure under rational operations of polynomials
(polynomial power series) (Salomaa and Soittola, 1978; Berstel andReutenauer, 1988)
Theorem 2 (Sch utzenberger, 1961). A power series is rational iff it can berepresented by a weighted nite automaton.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 9
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
10/189
Denitions: transducers (1) T = ( ; ; Q ; ; ; I ; F )
Finite alphabets and ,
Finite set of states Q ,
Transition function : Q ! 2 Q ,
Output function
: Q
Q !
, I Q set of initial states,
F Q set of nal states.
T denes a relation: R ( T ) = f ( u ; v ) 2 ( ) 2 : v 2
q 2 ( ( I ; u ) \ F )
( I ; u ; q ) g
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 10
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
11/189
Denitions: transducers (2)
0
a:a
1a:a
b:b
3a:b
2
b:a
b:ab:a
a:a
Figure 3: Fibonacci normalizer ( a b b ! b a a ] b a a a b b ] ).
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 11
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
12/189
Denitions: weighted transducers
0
a:b/0b:a/1
1a:b/0
a:b/1
2b:c/1 3 /0a:b/0
Figure 4: Example, a a b a ! ( b b c b ; ( 0 0 1 0 ) ( 0 1 1 0 ) ) .
( min ; + ) : a a b a ! min f 1 ; 2 g = 1
( + ; ) : a a b a ! 0 + 0 = 0
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 12
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
13/189
Composition: Motivation (1)
Construction of complex sets or functions from more elementary ones
Modular (modules, distinct linguistic descriptions) On-the-y expansion
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 13
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
14/189
Composition: Motivation (2)
lexicalanalyzer
syntaxanalyzer
semanticanalyzer
intermediate codegenerator
codeoptimizer
codegenerator
source program
target programFigure 5: Phases of a compiler (Aho et al. , 1986).
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 14
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
15/189
Composition: Motivation (3)
Spellchecker
Inflected forms
Index
Source text
Set of positions
Figure 6: Complex indexation.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 15
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
16/189
Composition: Example (1)
0 1a:a 2b: 3c: 4d:d
0 1a:d 2:e 3d:a
(0,0) (1,1)a:d (2,2)b:e (3,2)c: (4,3)d:a
Figure 7: Composition of transducers.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 16
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
17/189
Composition: Example (2)
0 1a:a/3 2b: /1 3c: /4 4d:d/2
0 1a:d/5 2:e /7 3d:a/6
(0,0) (1,1)a:d/15 (2,2)b:e/7 (3,2)c: /4 (4,3)d:a/12
Figure 8: Composition of weighted transducers ( + ; ).
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 17
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
18/189
Composition: Algorithm (1)
Construction of pairs of states
Match: q 1 a : b = w 1
! q
0
1 and q 2 b : c = w 2
! q
0
2
Result: ( q 1 ; q 2 ) a : c = ( w 1 w 2 )
! ( q
0
1 ; q 0
2 )
Elimination of -paths redundancy: lter
Complexity: quadratic
On-the-y implementation
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 18
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
19/189
Composition: Algorithm (2)
a:a b: c: d:d
a:d :e d:a
a:d 1 :e d:a
2 : 2 : 2 : 2 :
: 1
a:a b: 2 c: 2 d:d
: 1 : 1 : 1 : 1
0 1 2 3 4
0 1 2 3
0 1 2 3
0 1 2 3 4
(a)
(b)
(c)
(d)
A
B
A'
B'
Figure 9: Composition of weighted transducers with -transitions.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 19
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
20/189
Composition: Algorithm (3)
0,0 1,1 1,2
2,1 2,2
3,1 3,2
4,3
a:d :e
b:
c:
b:
c:
:e
:ed:a
b:e(x:x) (1:1 )
(1:1 )
(1:1 )
(2:2 )(2:2 )
(2:2 ) (2:2 )
(x:x)
(2:1)
Figure 10: Redundancy of -paths.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 20
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
21/189
Composition: Algorithm (4)
0
x:x2:1
11:1
2
2:2
x:x
1:1
x:x
2:2
Figure 11: Filter for efcient composition.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 21
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
22/189
Composition: Theory
Transductions (Elgot and Mezei, 1965; Eilenberg, 1974 1976;Berstel, 1979).
Theorem 3 Let 1 and 2 be two (weighted) (automata +transducers), then ( 1 2 ) is a (weighted) (automaton + transducer).
Efcient composition of weighted transducers (Mohri, Pereira, andRiley, 1996).
Works with any semiring
Intersection: composition of automata (weighted).
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 22
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
23/189
Intersection: Example
0
b1a
a
2
b
b
3a
a
b
0
1b 3
a
c
2b
c
4ba 5a
(0,0)
(0,1)b
(1,3)a
(0,2)b
(2,4)ba
(3,5)a
Figure 12: Intersection of automata.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 23
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
24/189
Union: Example
0
b/11a/3
a/5
2
b/2
b/6
3 /0a/4
a/3
b/7
0
1b/5 3
a/3
c/0
2b/2
c/1
4b/3a/6 5 /0a/4
0
1b/5 3
a/3
c/0
2b/2
c/1
4b/3a/6 5 /0a/4
6
b/1
7a/3
a/5
8
b/2
b/6
9 /0
a/4
a/3
b/7
10
/0
/0
Figure 13: Union of weighted automata (min ; + ).
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 24
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
25/189
Determinization: Motivation (1)
Efciency of use (time)
Elimination of redundancy No loss of information ( 6 = pruning)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 25
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
26/189
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
27/189
Determinization: Motivation (3)
0 1which/69.92flights/53.1
3flight/53.2
4leave/64.6
5leaves/62.3
6leave/63.6
7
leaves/67.6
8 /0
Detroit/103
Detroit/105
Detroit/105
Detroit/101
Figure 15: Determinized language model (9 states, 11 transitions, 4 paths).
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 27
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
28/189
Determinization: Example (1)
t4
0
2a
b
1
a
b
3
b
b
b
b
{0} {1,2}a
b{3}b
Figure 16: Determinization of automata.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 28
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
29/189
Determinization: Example (2)
t4
0
2a/1
b/4
1 /0
a/3
b/1
3 /0
b/1
b/3
b/3
b/5
{(0,0)}
{(1,2),(2,0)}/2a/1
{(1,0),(2,3)}/0
b/1{(3,0)}/0
b/1
b/3
Figure 17: Determinization of weighted automata (min ; + ).
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 29
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
30/189
Determinization: Example (3)
0
2
a:b
b:a
1
a:ba
b:aa
3
c:c
d:
{(0, )} {(1,a),(2, )}a:b
b:a
a
{(3, )}
c:c
d:a
Figure 18: Determinization of transducers.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 30
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
31/189
Determinization: Example (4)
0
2
a:b/3
b:a/2
1/0
a:ba/4
b:aa/3
3/0
c:c/5
d:/4
{(0,e,0)} {(1,a,1),(2, ,0)}a:b/3b:a/2
a/1
{(3,,0)}/0
c:c/5
d:a/5
Figure 19: Determinization of weighted transducers (min ; + ).
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 31
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
32/189
Determinization: Algorithm (1)
Generalization of the classical algorithm for automata Powerset construction
Subsets made of (state, weight) or (state, string, weight)
Applies to subsequentiable weighted automata and transducers Time and space complexity: exponential (polynomial w.r.t. size of
the result)
On-the-y implementation
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 32
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
33/189
Determinization: Algorithm (2)Conditions of applications
Twin states: q and q 0 are twin states iff:
If: they can be reached from the initial states by the same inputstring u
Then: cycles at q and q 0 with the same input string v have the
same output value Theorem 4 (Choffrut, 1978; Mohri, 1996a) Let be an
unambiguous weighted automaton (transducer, weighted transducer),then can be determinized iff it has the twin property.
Theorem 5 (Mohri, 1996a) The twin property can be tested in polynomial time.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 33
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
34/189
Determinization: Theory Determinization of automata
General case (Aho, Sethi, and Ullman, 1986)
Specic case of
: failure functions (Mohri, 1995) Determinization of transducers, weighted automata, and weighted
transducers
General description, theory and analysis (Mohri, 1996a; Mohri,1996b)
Conditions of application and test algorithm
Acyclic weighted transducers or transducers admitdeterminization
Can be used with other semirings (ex: ( R ; + ; ) )
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 34
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
35/189
Local determinization: Motivation
Time efciency
Reduction of redundancy
Control of the resulting size (exibility) Equivalent function (or equal set)
No loss of information
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 35
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
36/189
Local determinization: Example
0
1a:a/3
b:a/5
2a:b/4
b:b/6
c:a/2
3a:a/5
b:a/74
c:a/3
5b:c/3
a:b/3
a:c/2
a:a/3
c:b/2
0
1
{(1,a,0),(2,b,1),(3,a,2)}
a: /3
b:/53
{(2,,0)}
c:a/2
:b/1:
2
{(1,,0)}
:a/0
4
{(3,,0)}
:a/2 5
a:b/3
6
a:c/2
c:a/3
b:c/3
a:a/3
c:b/2
Figure 20: Local determinization of weighted transducers (min ; + ).
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 36
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
37/189
Local determinization: Algorithm
Predicate, ex: ( P
) ( o u t d e g r e e ( q ) > k
) k : threshold parameter
Local: D o m ( d e t ) = f q : P ( q ) g
Determinization only for q 2 D o m ( d e t )
On-the-y implementation
Complexity O ( j D o m ( d e t ) j max q 2 Q
( o u t d e g r e e ( q ) ) )
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 37
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
38/189
Local determinization: theory
Various choices of predicate (constraint: local)
Denition of parameters
Applies to all automata, weighted automata, transducers, andweighted transducers
Can be used with other semirings (ex: ( R ; + ; ) )
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 38
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
39/189
Minimization: Motivation
Space efciency
Equivalent function (or equal set) No loss of information ( 6 = pruning)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 39
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
40/189
Minimization: Motivation (2)
0 1which/69.92flights/53.1
3flight/53.2
4leave/64.6
5leaves/62.3
6leave/63.6
7
leaves/67.6
8 /0
Detroit/103
Detroit/105
Detroit/105
Detroit/101
Figure 21: Determinized language model .
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 40
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
41/189
Minimization: Motivation (3)
0 1which/2912flights/0
3flight/1.34
4
leave/0.0498
leaves/0
leave/0
leaves/0.132
5 /0Detroit/0
Figure 22: Minimized language model .
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 41
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
42/189
Minimization: Example (1)
t96
01a
3
b
a
2b
4
b
c
5
a
bc
b
a
t97
0
1a
3
b
a
2
b
b
c 4ab
Figure 23: Minimization of automata.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 42
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
43/189
Minimization: Example (2)
0 1a:0
b:1
d:0
2a:3
4
b:2
3
c:2
5
c:1
d:4
6
e:3c:1
7e:1d:3 e:2
0 1a:6b:7
d:0
2a:3
4
b:0
3
c:0
5
c:0
d:6 6e:0
c:1
7e:0
d:6 e:0
0 1a:6
b:7
d:0
2
a:3
b:0
3
c:0
d:64e:0
c:1
5e:0
Figure 24: Minimization of weighted automata (min ; + ).
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 43
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
44/189
Minimization: Example (3)
0
1a:A
4
b:C
2b:B
5b:C
3
c:C
d:D
a:DB
6e:D 7f:BC
c:D
0
1a:ABCDB
4b:CCDDB
2b:
5b:
3
c:
d:CDB
a:DB
6e:C 7f:
c:
0 1a:ABCDB
b:CCDDB
2b:
3
c:
d:CDB
a:DB
5e:C 6f:
Figure 25: Minimization of transducers.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 44
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
45/189
Minimization: Example (4)
0
1a:A/0
4
b:C/2
2b:B/5
5b:C/2
3
c:C/3
d:D/1
a:DB/2
6e:D/1 7/0f:BC/6
c:D/4
0
1a:ABCDB/15
4b:CCDDB/15
2b: /0
5b: /0
3
c: /0
d:CDB/9
a:DB/2
6e:C/0 7/0f:/0
c: /0
0 1a:ABCDB/15
b:CCDDB/15
2b:/0
3
c: /0
d:CDB/9
a:DB/2
5e:C/0 6/0f: /0
Figure 26: Minimization of weighted transducers (min ; + ).
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 45
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
46/189
Minimization: Algorithm (1)
Two steps Pushing or extraction of strings or weights towards initial state
Classical minimization of automata, (input,ouput) considered as asingle label
Algorithm for the rst step
Transducers: specic algorithm
Weighted automata: shortest-paths algorithms
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 46
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
47/189
Minimization: Algorithm (2) Complexity
E: set of transitions
S: sum of the lengths of output strings the longest of the longest common prexes of the output paths
leaving each state
Type General AcyclicAutomata O ( j E j log ( j Q j ) ) O ( j Q j + j E j )
Weighted automata O ( j E j log ( j Q j ) ) O ( j Q j + j E j )
Transducers O ( j Q j + j E j O ( S + j E j + j Q j +
( log j Q j + j P m a x
j ) ) ( j E j ( j Q j j F j ) )
j P
m a x
j )
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 47
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
48/189
Minimization: Theory
Minimization of automata (Aho, Hopcroft, and Ullman, 1974; Revuz,1991)
Minimization of transducers (Mohri, 1994)
Minimization of weighted automata (Mohri, 1996a)
Minimal number of transitions Test of equivalence
Standardization of power series (Sch utzenberger, 1961)
Works only with elds Creates too many transitions
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 48
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
49/189
Conclusion (1)
Theory
Rational power series
Weighted automata and transducers
Algorithms
General (various semirings)
Efciency (used in practice, large sizes)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 49
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
50/189
Conclusion (2)
Applications
Text processing(spelling checkers, pattern-matching, indexation, OCR)
Language processing
(morphology, phonology, syntax, language modeling) Speech processing (speech recognition, text-to-speech synthesis)
Computational biology (matching with errors)
Many other applications
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 50
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
51/189
PART IISpeech Recognition
Michael RileyAT&T Laboratories
August 3rd, 1996
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 51
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
52/189
Overview The speech recognition problem
Acoustic, lexical and grammatical models
Finite-state automata in speech recognition
Search in nite-state automata
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 52
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
53/189
Speech Recognition
Given an utterance, nd its most likely written transcription.
Fundamental ideas:
Utterances are built from sequences of units
Acoustic correlates of a unit are affected by surrounding units
Units combine into units at a higher level phones ! syllables !words
Relationships between levels can be modeled by weighted graphs we use weighted nite-state transducers
Recognition: nd the best path in a suitable product graph
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 53
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
54/189
Levels of Speech Representation
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 54
d
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
55/189
Maximum A Posteriori Decoding
Overall analysis [4, 57]:
Acoustic observations: parameter vectors derived by local spectralanalysis of the speech waveform at regular (e.g. 10msec) intervals
Observation sequence o
Transcriptions w
Probability P ( o j w ) of observing o when w is uttered
Maximum a posteriori decoding :
w = argmaxw
P ( w j o ) = argmaxw
P ( o j w ) P ( w ) P ( o )
= argmaxw P ( o j w ) | { z }
generativemodel
P ( w ) | { z }
languagemodel
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 55
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
56/189
Generative Models of Speech
Typical decomposition of P ( o j w ) into conditionally-independentmappings between levels:
Acoustic model P (
o j
p )
: phone sequences !
observation sequences.Detailed model:
P ( o j d ) : distributions ! observation vectors symbolic ! quantitative
P ( d j m ) : context-dependent phone models !distribution sequences
P ( m j p ) : phone sequences ! model sequences
Pronunciation model P ( p j w ) : word sequences ! phone sequences
Language model P ( w ) : word sequences
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 56
R iti C d G l F
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
57/189
Recognition Cascades: General Form
Multistage cascade:
o = sk w = s0s1sk 1stage k stage 1
Find s0 maximizing
P ( s0
; s k
) = P ( s k
j s0
) P ( s0
) = P ( s0
)
X
s1 ; : : : ; s k 1
Y
1 j k
P ( s j
j s j
1 )
Viterbi approximation:
Cost (
s0 ;
s k ) =
Cost (
s k j
s0 ) +
Cost (
s0 )
Cost ( s k
j s0 ) min s1 ; : : : ; s k 1P
1 j k Cost ( s j j s j 1 )
where Cost ( : : : ) = log P ( : : : ) .
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 57
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
58/189
Speech Recognition Problems Modeling: how to describe accurately the relations between levels )
modeling errors
Search: how to nd the best interpretation of the observationsaccording to the given models ) search errors
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 58
Acoustic Modeling Feature Selection I
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
59/189
Acoustic Modeling Feature Selection I
Short-time spectral analysis:
log
Z
g ( ) x ( t + ) e
i 2 f d
Short-time (25 msec. Hamming window) spectrum of /ae/ Hz. vs. Db.
Scale selection: Cepstral smoothing
Parameter sampling (13 parameters)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 59
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
60/189
Acoustic Modeling Feature Selection II [40, 38]
Renements
Time derivatives 1st and 2nd order non-Fourier analysis (e.g., Mel scale)
speaker/channel adaptation
mean cepstral subtraction vocal tract normalization linear transformations
Result: 39 dimensional feature vector (13 cepstra, 13 delta cepstra,13 delta-delta cepstra) every 10 milliseconds
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 60
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
61/189
Acoustic Modeling Stochastic Distributions [4, 61, 39, 5]
Vector quantization nd codebook of prototypes
Full covariance multivariate Gaussians:
P y ] =1
(
2 )
N =
2 j S j 1
=
2
e
12 ( y
T
T
) S 1 ( y )
Diagonal covariance Gaussian mixtures
Semi-continuous, tied mixtures
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 61
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
62/189
Acoustic Modeling Units and Training [61, 36]
Units
Phonetic ( sub-word ) units e.g., cat > /k ae t/ Context-dependent units a e
k ; t
Multiple distributions ( states ) per phone left, middle, right
Training
Given a segmentation , training straight-forward
Obtain segmentation by transcription
Iterate until convergence
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 62
Generating Lexicons Two Steps
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
63/189
Generating Lexicons Two Steps
Orthography ! Phonemeshad ! /hh ae d/ your ! /y uw r/
complex, context-independent mapping
usually small number of alternatives
determined by spelling constraints; lexical facts
large online dictionaries available
Phonemes ! Phones /hh ae d y uw r/ ! [hh ae dcl jh axr] (60% prob) /hh ae d y uw r/ ! [hh ae dcl d y axr] (40% prob)
complex, context-dependent mapping many possible alternatives
determined by phonological and phonetic constraints
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 63
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
64/189
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
65/189
1. Decision Tree Splitting Rules
Which split to take at a node?
Candidate splits considered.
Binary cuts : For continuous 1 x < 1 , consider splits of
form: x k vs : x > k ; 8 k :
Binary partitions : For categorical x 2 f 1 ; 2 ; : : : ; n g = X ,consider splits of form:
x 2 A vs : x 2 X A ; 8 A X :
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 65
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
66/189
2. Decision Tree Stopping Rules
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
67/189
pp g
When to declare a node terminal? Strategy ( Cost-Complexity pruning ):
1. Grow over-large tree.
2. Form sequence of subtrees, T 0 ; : : : ; T n ranging from full tree to just the root node.
3. Estimate honest error rate for each subtree.
4. Choose tree size with mininum honest error rate.
To form sequence of subtrees, vary from 0 (for full tree) to 1 (for just root node) in:
min T
R ( T ) + j T j
:
To estimate honest error rate, test on data different from trainingdata, e.g., grow tree on 9 = 10 of available data and test on 1 = 10 of datarepeating 10 times and averaging ( cross-validation) .
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 67
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
68/189
End of Declarative Sentence Prediction: PruningSequence
+++++++++++++++++++++++++++++++++++++++++++++
++++
+
+
+
+
+ = raw, o = cross-validated# of terminal nodes
e r r o r r a
t e
0 20 40 60 80 100
0 . 0
0
. 0 0 5
0 . 0
1 5
0 . 0
2 5
ooooooooooooooooooooooooooooooo
oo
o
o
o
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 68
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
69/189
3. Decision Tree Node Assignment
Which class/value to assign to a terminal node?
Plurality vote : Choose most frequent class at that node forclassication; choose mean value for regression.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 69
End-of-Declarative-Sentence Prediction: Features [65]
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
70/189
Prob[word with . occurs at end of sentence]
Prob[word after . occurs at beginning of sentence]
Length of word with .
Length of word after .
Case of word with .: Upper, Lower, Cap, Numbers Case of word after .: Upper, Lower, Cap, Numbers
Punctuation after . (if any)
Abbreviation class of word with .: e.g., month name,unit-of-measure, title, address name, etc.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 70
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
71/189
End of Declarative Sentence?
bprob:27.29
48294/52895
yes
eprob:1.045
5539/10020
yes
3289/3547
no
next:cap,upcase+.next:n/a,lcase,lcase+.,upcase,num
5281/6473
yes
type:n/atype:addr,com,group,state,title,unit
5156/5435
yes
5137/5283
yes
133/152
no
913/1038
no
42755/42875
yes
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 71
Phoneme-to-Phone Alignment
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
72/189
PHONEME PHONE WORDp p purposeer erp pcl
- pax ixs sae ax andn n
d -r r respectih ixs sp pcl- peh ehk kclt t
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 72
Phoneme-to-Phone Realization: Features [66, 10, 62]
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
73/189
Phonemic Context:
Phoneme to predict
Three phonemes to left Three phonemes to right
Stress (0, 1, 2)
Lexical Position:
Phoneme count from start of word
Phoneme count from end of word
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 73
Phoneme-to-Phone Realization: Prediction Example
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
74/189
Tree splits for /t/ in your pretty red :
PHONE COUNT SPLITix 182499n 87283 cm0: vstp,ustp,vfri,ufri,vaff,uaff,nas
kcl+k 38942 cm0: vstp,ustp,vaff,uaff tcl+t 21852 cp0: alv,paltcl+t 11928 cm0: ustptcl+t 5918 vm1: mono,rvow,wdi,ydi
dx 3639 cm-1: ustp,rho,n/adx 2454 rstr: n/a,no
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 74
Phoneme-to-Phone Realization: Network Example
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
75/189
Phonetic network for Don had your pretty... :
PHONEME PHONE1 PHONE2 PHONE3 CONTEXTd 0.91 d
aa 0.92 aan 0.98 nhh 0.74 hh 0.15 hvae 0.73 ae 0.19 ehd 0.51 dcl jh 0.37 dcl d
y 0.90 y (if d ! dcl d)0.84 - 0.16 y (if d ! dcl jh)
uw 0.48 axr 0.29 err 0.99 -p 0.99 pcl pr 0.99 rih 0.86 iht 0.73 dx 0.11 tcl tiy 0.90 iy
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 75
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
76/189
Acoustic Model Context Selection [92, 39]
Statistical regression trees used to predict contexts based ondistribution variance
One tree per context-independent phone and state (left, middle, right)
The trees were grown until the data criterion of 500 frames perdistribution was met
Trees pruned using cost-complexity pruning and cross-validation toselect best contexts
About 44000 context-dependent phone models
About 16000 distributions
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 76
N-Grams: Basics
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
77/189
Chain Rule and Joint/Conditional Probabilities:
P x 1 x 2 : : : x N ] = P x N j x 1 : : : x N 1 ] P x N 1 j x 1 : : : x N 2 ] : : : P x 2 j x 1 ] P x 1 ]
where, e.g.,
P x
N
j x 1 : : : x N 1 ] = P x 1 : : : x N ]
P x 1 : : : x N 1 ]
(FirstOrder) Markov assumption:
P x
k
j x 1 : : : x k 1 ] = P x k j x k 1 ] = P x
k 1 x k ]
P x
k 1 ]
nthOrder Markov assumption:
P x
k
j x 1 : : : x k 1 ] = P x k j x k n : : : x k 1 ] = P x
k n
: : : x
k
]
P x
k n
: : : x
k 1 ]
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 77
N-Grams: Maximum Likelihood Estimation
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
78/189
Let N be total number of n-grams observed in a corpus and c ( x 1 : : : x n )be the number of times the n-gram x 1 : : : x n occurred. Then
P x 1 : : : x n ] = c ( x 1 : : : x n )
N
is the maximum likelihood estimate of that n-gram probability.
For conditional probabilities,
P x
n
j x 1 : : : x n 1 ] = c ( x 1 : : : x n )
c ( x 1 : : : x n 1 ) :
is the maximum likelihood estimate.With this method, an n-gram that does not occur in the corpus is assignedzero probability.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 78
N-Grams: Good-Turing-Katz Estimation [29, 16]
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
79/189
Let n r
be the number of n-grams that occurred r times. Then
P x 1 : : : x n ] = c
( x 1 : : : x n )
N
is the Good-Turing estimate of that n-gram probability, where c
( x ) = ( c ( x ) + 1 ) n
c ( x ) + 1 n
c ( x )
:
For conditional probabilities,
P x
n
j x 1 : : : x n 1 ] = c
( x 1 : : : x n )
c ( x 1 : : : x n 1 ) ; c ( x 1 : : : x n ) > 0
is Katzs extension of the Good-Turing estimate.
With this method, an n-gram that does not occur in the corpus is assignedthe backoff probability P x
n
j x 1 : : : x n 1 ] = P x n j x 2 : : : x n 1 ] ; where is a normalizing constant.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 79
Finite-State Modeling [57]
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
80/189
Our view of recognition cascades : represent mappings between levels,observation sequences and language uniformly with weighted nite-statemachines:
Probabilistic mapping P (
x j
y )
: weighted nite-state transducer .Example word pronunciation transducer:
d: /1 ey: /.4
ae: /.6
dx: /.8
t: /.2
ax:"data"/1
Language model P ( w ) : weighted nite-state acceptor
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 80
Example of Recognition Cascade
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
81/189
phones wordsA D M
observationsO
Recognition from observations o by composition:
Observations: O ( s ; s ) =
8
<
:
1 if s = o
0 otherwise
Acoustic-phone transducer: A ( a ; p ) = P ( a j p )
Pronunciation dictionary: D ( p ; w ) = P ( p j w )
Language model: M ( w ; w ) = P ( w )
Recognition: w = argmaxw
( O A D M ) ( o ; w )
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 81
Speech Models as Weighted Automata
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
82/189
Quantized observations:on. . .t1 t2t0
o1 o2 tn
Phone model A
: observations ! phones
o i: /p01 (i) : /p2f
...
... ...
... ...
o i: /p12 (i)
o i: /p00 (i) o i: /p11 (i) o i: /p22 (i)
s0 s1 s2
Acoustic transducer: A =
P
A
Word pronunciations D data : phones ! words
d: /1 ey: /.4
ae: /.6
dx: /.8
t: /.2
ax:"data"/1
Dictionary: D =
P
w
D
w
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 82
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
83/189
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
84/189
Sample Pronunciation Dictionary D
Dictionary with hostile , battle and bottle as a weighted transducer:
0
15
-:-/2.466
l:-/0.112
14
b:bottle/0.000
17-:-/0.000
16-:-/0.000
1
-:-/0.014
2
ax:-/2.607
ay:-/1.616
el:-/0.4313
t:-/0.067
4s:-/0.035
5
-:-/2.466
l:-/0.112
6-:-/0.014
7
ax:-/2.607
el:-/0.164
8t:-/2.113
dx:-/0.2409 ae:-/0.057
10-:-/2.466
l:-/0.112
11-:-/0.014
12
ax:-/2.607
el:-/0.164
13 t:-/2.113
dx:-/0.240
aa:-/0.055
18-:hostile/2.943
hh:hostile/0.134
hv:hostile/2.635
b:battle/0.000
aa:-/0.055
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 84
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
85/189
Sample Language Model M
Simplied language model as a weighted acceptor:
0 4-/2.374
5
-/3.961
2battle/6.603
hostile/9.394
-/3.173
battle/9.268
1
bottle/11.510
-/1.882
-/2.306
-/1.102
-/1.9133
hostile/11.119
-/3.537
battle/10.896
bottle/13.970
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 85
Recognition by Composition
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
86/189
From phones to words: compose dictionary with phone lattice toyield word lattice with combined acoustic and pronunciation costs:
0 1hostile/-32.900 2battle/-26.825
Applying language model: Compose word lattice with language
model to obtain word lattice with combined acoustic, pronunciationand language model costs:
0
2hostile/-21.781
1hostile/-19.407 3
battle/-17.916
battle/-15.250
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 86
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
87/189
Context-Dependency Examples
Context-dependent phone models: Maps from CI units to CD units.Example: a e = b d ! a e
b ; d
Context-dependent allophonic rules: Maps from baseforms to
detailed phones. Example: t = V
0
V ! d x
Difculty: Cross-word contexts where several words enter andleave a state in the grammar, substitution does not apply.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 87
Context-Dependency Transducers
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
88/189
Example triphonic context transducer for two symbols x and y .
x.x x/x_x:x
x.y
x/x_y:x
y.x
y/x_x:yy.y
y/x_y:y x/y_x:x
x/y_y:x
y/y_x:y
y/y_y:y
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 88
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
89/189
On-Demand Composition [69, 53]
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
90/189
Create generalized state machine C for composition A B .
C : s t a r t : = ( A : s t a r t ; B : s t a r t )
C : f i n a l ( ( s
1 ; s
2 ) )
: = A : f i n a l ( s
1 ) ^ B : f i n a l ( s
2 )
C : a r c s ( ( s 1 ; s 2 ) ) : = M e r g e ( A : a r c s ( s 1 ) ; B : a r c s ( s 2 ) )
Merged arcs dened as:
( l 1 ; l 3 ; x + y ; ( n s 1 ; n s 2 ) ) 2 M e r g e ( A : a r c s ( s 1 ) ; B : a r c s ( s 2 ) )
iff
( l 1 ; l 2 ; x ; n s 1 ) 2 A : a r c s ( s 1 ) and ( l 2 ; l 3 ; y ; n s 2 ) 2 B : a r c s ( s 2 )
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 90
State Caching
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
91/189
Create generalized state machine B for input machine A .
B : s t a r t : = A : s t a r t
B : f i n a l ( s t a t e )
: = A : f i n a l ( s t a t e )
B : a r c s ( s t a t e ) : = A : a r c s ( s t a t e )
Cache Disciplines:
Expand each state of A exactly once, i.e. always save in cache(memoize).
Cache, but forget old states using a least-recently used criterion.
Use instructions (ref counts) from user (decoder) to save and forget.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 91
O D d C iti R lt
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
92/189
On Demand Composition ResultsATIS Task - class-based trigram grammar, full cross-word triphoniccontext-dependency.
states arcs
context 762 40386
lexicon 3150 4816
grammar 48758 359532
full expansion 1 : 6 106 5 : 1 106
For the same recognition accuracy as with a static, fully expanded
network, on-demand composition expands just 1.6% of the total numberof arcs.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 92
Determinization in Large Vocabulary Recognition
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
93/189
Determinization in Large Vocabulary Recognition For large vocabularies, string lexicons are very non-deterministic
Determinizing the lexicon solves this problem, but can introducenon-coassessible states during its composition with the grammar
Alternate Solutions:
Off-line compose, determinize, and minimize:
L e x i c o n G r a m m a r
Pre-tabulate non-coassessible states in the composition of:
D e t ( L e x i c o n ) G r a m m a r
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 93
Search in Recognition Cascades
Reminder: Cost log probability
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
94/189
Reminder: Cost log probability Example recognition problem: w = argmax
w ( O A D M ) ( o ; w )
Viterbi search : approximate w by the output word sequence for thelowest-cost path from the start state to a nal state in O A D M ignores summing over multiple paths with same output:
...:w 1
...:w i
...:w n...:
...:
...:
...:
>
O A D M
Composition preserves acyclicity, O is acyclic ) acyclic searchgraph
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 94
Single-source Shortest Path Algorithms [83]
Meta algorithm:
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
95/189
Meta-algorithm: Q f s 0 g ; 8 s ; C o s t ( s ) 1
While Q not empty, s D e q u e u e ( Q )
For each s 0 2 A d j s ] such that C o s t ( s 0 ) > C o s t ( s ) + c o s t ( s ; s 0 )
C o s t ( s
0
) C o s t ( s ) + c o s t ( s ; s
0
)
E n q u e u e ( Q ; s )
Specic algorithms:
Name Queue type Cycles Neg. Weights Complexity
acyclic topological no yes O ( j V j + j E j )
Dijkstra best-rst yes no O ( j E j log j V j )
Bellman-Ford FIFO yes yes O ( j V j j E j )
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 95
The Search Problem
Obvious rst approach : use an appropriate single-source
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
96/189
Obvious rst approach : use an appropriate single sourceshortest-path algorithm
Problem: impractical to visit all states, can we do better?
Admissible methods: guarantee nding best path, but reordersearch to avoid exploring provably bad regions
Non-admissible methods: may fail to nd best path, but may needto explore much less of the graph
Current practical approaches:
Heuristic cost functions
Beam search
Multipass search
Rescoring
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 96
Heuristic Cost Function A* Search [4 56 17]
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
97/189
Heuristic Cost Function A Search [4, 56, 17] States in search ordered by
cost-so-far ( s ) + lower-bound-to-complete ( s )
With a tight bound, states not on good paths are not explored
With a loose lower bound no better than Dijkstras algorithm
Where to nd a tight bound? Full search of a composition of smaller automata (homomorphic
automata with lower-bounding costs?)
Non-admissible A* variants: use averaged estimate of cost-to-complete, not a lower-bound
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 97
Beam Search [35]
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
98/189
Beam Search [35] Only explore states with costs within a beam (threshold) of the cost
of the best comparable state
Non-admissible
Comparable states states corresponding to (approximately) thesame observations
Synchronous (Viterbi) search: explore composition states inchronological observation order
Problem with synchronous beam search: too local, some observation
subsequences are unreliable and may locally put the best overall pathoutside the beam
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 98
Beam-Search Tradeoffs [68]
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
99/189
Beam Search Tradeoffs [68]
Word lattice: result of composing observation sequence, leveltransducers and language model.
Beam Word latticeerror rate
Median numberof edges
4 7.3% 86.5
6 5.4% 244.58 4.4% 827
10 4.1% 3520
12 4.0% 13813.5
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 99
Multipass Search [52, 3, 68]
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
100/189
Multipass Search [52, 3, 68]
Use a succession of binary compositions instead of a single n -waycomposition combinable with other methods
Prune : Use two-pass variant of composition to remove states not inany path close enough to the best
Pruned intermediate lattices are smaller, lower number of statepairings considered
Approximate : use simpler models (context-independent phonemodels, low-order language models)
Rescore : : :
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 100
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
101/189
PART III
Finite State Methods in Language
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
102/189
Finite State Methods in LanguageProcessing
Richard Sproat
Speech Synthesis Research Department
Bell Laboratories, Lucent Technologies
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 102
Overview
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
103/189
Text-analysis for Text-to-Speech (TTS) Synthesis
A rich domain with lots of linguistic problems
Probably the least familiar application of NLP technologies
Syntactic analysis
Some thoughts on text indexation
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 103
The Nature of the TTS Problem
This is some text:
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
104/189
This is some text:It was a dark andstormy night. Fourscore and sevenyears ago. Now isthe time for allgood men. Letthem eat cake.Quoth the ravennevermore.
Linguistic Analysis
Speech Synthesis
phonemes, durationsand pitch contours
speech waveforms
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 104
From Text to Linguistic Representation
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
105/189
Y o C
The rat is eating the oil &
l a u
us
j o u
rt s
N V N
shu3 chi1 you2lao3
L H H L HL
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 105
Russian Percentages: The ProblemHow do you say % in Russian?
Adjectival forms when modifying nouns
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
106/189
dject va o s w e od y g ou s20% s k i d k a ) d v a d c a t i - p r o c e n t n a s k i d k a
20% discount dvadcat i - procent naja skidka
s
20% r a s t v o r o m ) s d v a d c a t i
- p r o c e n t n y m r a s t v o r o m
with 20% solution s dvadcat i - procent nym rastvorom
Nominal forms otherwise21% ) d v a d c a t ~ o d i n p r o c e n t
dvadcat odin procent
23% ) d v a d c a t ~ t r i p r o c e n t a
dvadcat tri procent a
20% ) d v a d c a t ~ p r o c e n t o v
dvadcat procent ov
s 20% ) s d v a d c a t ~ p r o c e n t a m i
with 20% s dvadcat ju procent ami
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 106
Text Analysis Problems
Segment text into words.
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
107/189
Segment text into sentences, checking for and expandingabbreviations :
St. Louis is in Missouri.
Expand numbers
Lexical and morphological analysis
Word pronunciation
Homograph disambiguation
Phrasing
Accentuation
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 107
Desiderata for a Model of Text Analysis for TTS
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
108/189
Delay decisions until have enough information to make them
Possibly weight various alternatives
Weighted Finite-State Transducers offer an attractive computational model
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 108
Overall Architectural Matters
Example: word pronunciation in Russian
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
109/189
Text form: k o s t r a < kostra > (bonre+genitive.singular)
Morphological analysis: k o s t
0 E r f noun g f masc g f inan g + 0 a f sg g f gen g
Pronunciation: /k str 0 a/
Minimal Morphologically-Motivated Annotation (MMA): k o s t r 0 a
(Sproat, 1996)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 109
Overall Architectural Matters
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
110/189
Pronunciation
Language Model
Surface Orthographic FormKOSTPA #kastr"a#
#KOSTP"A#
:
:
fst:
:
fst
:
:
fst
:
:
fst
Morphological Analysis#KOST"{E}P{noun}{masc}{inan}+"A{sg}{gen}#
:
MMA
S
M
P
D
S O11
Lexical Analysis WFST:
L
O
Phonological Analysis WFST:PLL = D MO
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 110
Orthography ! Lexical Representation
A Closer Look
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
111/189
Words : Lex. Annot. Lex. Annot. : Lex. Anal. _ Punc. :Interp.
S S
Special Symbols : Expansions SPACE :Interp.S
Numerals : Expansions
SPACE : white space in German, Spanish, Russian : : :
in Japanese, Chinese : : :
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 111
Chinese Word Segmentation
F !
F
1 asp4 :
68 le0 p e r f
F !
F 2 1 vb8 : 11 liao3jie3 understand
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
112/189
j !
j 1 vb5 : 56 da4 big
j ! j 1 nc11 : 45 da4jie1 avenue
!
2 adv4 : 58 bu4 not b
! b vb4 : 45 zai4 at
!
vb11 : 77 wang4 forget
F !
vb++ 2 F 2 npot12 : 23 wang4+bu4liao3 unable to forget
! np4 : 88 wo3 I ! vb8 : 05 fang4 place
j !
j 1 vb10 : 70 fang4da4 enlarge
!
1 nc11 : 02 na3li3 where
!
nc10 : 35 jie1 avenue ! 1 nc10 : 92 jie3fang4 liberation
j ! 3 j 1 urnp42 : 23 xie4 fang4da4 n a m e
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 112
Chinese Word Segmentation
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
113/189
Space = : #
L = S p a c e _ ( D i c t i o n a r y _ ( S p a c e P u n c ) ) +
BestPath( F j b L) = pro4 : 88 # vb + F 2 npot12 : 23# 1 nc10 : 92 j 1 nc11 : 45 : : :
I couldnt forget where Liberation Avenue is.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 113
Numeral Expansion
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
114/189
234 Factorization ) 2 102 + 3 101 + 4
DecadeFlop ) 2 102 + 4 + 3 101
NumberLexicon
+
zwei+hundert+vier+und+dreiig
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 114
Numeral Expansion
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
115/189
0 1
1:1
2:2
3:3
4:45:56:67:7
8:8
9:9
4
:101
2
:102 5
0:0
1:1
2:2
3:3
4:45:56:6
7:7
8:8
9:93
0:0
1:1
2:2
3:3
4:45:56:6
7:7
8:8
9:9
:101
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 115
German Numeral Lexicon / f 1 g : (eins f num g ( f masc g j f neut g ) f sg g f ## g )/
/ f 2 g : (zwei f num g f ## g )/
/ 3 (d i ## )/
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
116/189
/ f 3 g : (drei f num g f ## g )/ ...
/( f 0 g f +++ g f 1 g f 10 ^ 1 g ) : (zehn f num g f ## g )/ /( f 1 g f +++ g f 1 g f 10 ^ 1 g ) : (elf f num g f ## g )/
/( f 2 g f +++ g f 1 g f 10 ^ 1 g ) : (zw olf f num g f ## g )/
/( f 3 g f +++ g f 1 g f 10 ^ 1 g ) : (drei f ++ g zehn f num g f ## g )/
...
/( f 2 g f 10 ^ 1 g ) : (zwan f ++ g zig f num g f ## g )/
/( f 3 g f 10 ^ 1 g ) : (drei f ++ g ig f num g f ## g )/ ...
/( f 10 ^ 2 g ) : (hundert f num g f ## g )/
/( f 10 ^ 3 g ) : (tausend f num g f neut g f ## g )/
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 116
Morphology: Paradigmatic Specications
Paradigm A1
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
117/189
Paradigm f A1 g
# starke Flektion (z.B. nach unbestimmtem Artikel)
Sufx f ++ g er f sg g f masc g f nom g
Sufx f ++ g en f sg g f masc g ( f gen g j f dat g j f acc g )
Sufx f ++ g e f sg g f femi g ( f nom g j f acc g )
Sufx f ++ g en f sg g ( f femi g j f neut g )( f gen g j f dat g )Sufx f ++ g es f sg g f neut g ( f nom g j f acc g )
Sufx f ++ g e f pl g ( f nom g j f acc g )
Sufx f ++ g er f pl g f gen g
Sufx f ++ g en f pl g f dat g
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 117
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
118/189
Morphology: Paradigmatic Specications
/ f
A1 g
: (aal f
++ g
glatt f
adj g
)/ / f A1 g : (ab f ++ g ander f ++ g lich f adj g f umlt g )/
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
119/189
/ f A1 g : (ab f ++ g artig f adj g )/
/ f
A1 g
: (ab f
++ g
bau f
++ g
wurdig f
adj g f
umlt g
)/ ...
/ f A6 g : (dein f adj g )/
/ f
A6 g
: (euer f
adj g
)/ / f A6 g : (ihr f adj g )/
/ f A6 g : (Ihr f adj g )/
/ f
A6 g
: (mein f
adj g
)/ / f A6 g : (sein f adj g )/
/ f A6 g : (unser f adj g )/
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 119
Morphology: Paradigmatic Specications
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
120/189
Project(( f A6 g _ Endings) (( f A6 g :Stems) _ Id( ))) )
0 1m 2 3e 4i 5n 6adj 7++
8sg
12
e
9masc
11
neut
pl
13sg
14
m
17
n 20
r 24s
10
nom
nomaccfemi
15sg 16
pl18
sg
21sg
23pl
25sg
masc
neut
dat19masc
acc22femi
gen
gen
dat
mascneut
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 120
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
121/189
Morphology: Finite-State Grammar
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
122/189
FUGE SECOND f ++ g < 1.5 >
FUGE SECOND f
++ g
s f
++ g
...
SECOND PREFIX f Eps g < 1.0 >
SECOND STEM f Eps g < 2.0 >
SECOND WORD f Eps g < 2.0 >...
WORD
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 122
Morphology: Finite-State Grammar
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
123/189
Unanst andigkeitsunterstellungallegation of indecency
+
"un f ++ g "an f ++ g stand f ++ g ig f ++ g keit f ++ g s f ++ g unter f ++ g stell f ++ g ung
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 123
Rewrite Rule Compilation
Context-dependent rewrite rules
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
124/189
General form : ! =
; ; ; regular expressions.
Constraint: cannot be rewritten but can be used as a context
Example : a ! b = c b
(Johnson, 1972; Kaplan & Kay, 1994; Karttunen, 1995; Mohri & Sproat,1996)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 124
Example
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
125/189
a ! b = c b
w = c a b
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 125
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
126/189
Example
After replace :
2
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
127/189
0 1c2
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
128/189
Based on the use of marking transducers
Brackets inserted only where needed
Efciency
3 determinizations + additional linear time work
Smaller number of compositions
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 128
Rule Compilation Method
r f r e p l a c e l 1 l 2
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
129/189
r : ! >
f : ( f > g ) > ! ( f > g ) f < 1 ; < 2 g >
r e p l a c e : < 1 > ! < 1
l 1 :
< 1 !
l 2 : < 2 !
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 129
Marking Transducers
Proposition Let be a deterministic automaton representing then
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
130/189
Proposition Let be a deterministic automaton representing , thenthe transducer post-marks occurrences of by #.
q
c:c
d:d
a:a
b:b
Final state q with entering and leaving transitions of I d ( ) .
q q:#
c:c
d:d
a:a
b:b
States and transitions after modications, transducer .
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 130
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
131/189
The Transducers as Expressions using Marker
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
132/189
r = r e v e r s e ( M a r k e r ( r e v e r s e ( ) ; 1 ; f > g ; ; ) ) ]
f = r e v e r s e ( M a r k e r ( ( f > g ) r e v e r s e ( >
> ) ; 1 ; f < 1 ; < 2 g ; ; ) ) ]
l 1 = M a r k e r ( ; 2 ; ; ; f < 1 g ) ] < 2 : < 2 l 2 = M a r k e r ( ; 3 ; ; ; f < 2 g ) ]
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 132
Example: r for rule a ! b = c b
a:ac:c
b bb:b
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
133/189
r e v e r s e ( ) =
0 1
b:b
a:a
c:c
M a r k e r ( r e v e r s e ( ) ; 1 ; f > g ; ; ) =
0
a:ac:c
1b:b
Eps:>
r e v e r s e ( M a r k e r ( r e v e r s e ( ) ; 1 ; f > g ; ; ) ) =
0
a:ac:c
1Eps:>
b:b
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 133
The Replace Transducer
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
134/189
0
:, < :< , >:2 2
1< ::1 2
>:
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 134
Extension to Weighted Rules
Weighted context-dependent rules:
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
135/189
! =
; ; regular expressions,
formal power series on the tropical semiring
Example: c ! ( : 9 c ) + ( : 1 t ) = a t
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 135
Rational power series
Functions S : ! R +
f 1 g , Rational power series
Tropical semiring : min
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
136/189
Tropical semiring : ( R +
f 1 g ; min ; + )
Notation: S =
X
w 2 ( S ; w )
Example: S = ( 2 a ) ( 3 b ) ( 4 b ) ( 5 b ) + ( 5 a ) ( 3 b )
( S ; a b b b ) = min f 2 + 3 + 4 + 5 = 14 ; 5 + 3 + 3 + 3 = 11 g = 11
Theorem 6 (Sch utzenberger, 1961): S is rational iff it is recognizable(representable by a weighted transducer).
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 136
Compilation of weighted rules
Extension of the composition algorithm to the weighted case
Efcient lter for -transitions
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
137/189
Efcient lter for transitions
Addition of weights of matching labels
Same compilation algorithm
Single-source shortest paths algorithms to nd the best path
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 137
Rewrite Rules: An Example
s ! z = ($|#) VStop ;
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
138/189
($| ) p ;
0
V:V$:$z:z
VStop:VStop#:#
1
s:s
2
s:z
z:z
V:V
s:s
s:z
3#:#
$:$
VStop:VStop
4$:$
#:#
V:V
$:$
z:z
#:#
s:ss:z
VStop:VStop
/mis$mo$/ Voicing = /miz$mo$/
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 138
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
139/189
Russian Percentage Expansion: An example
s 5% s k i d k o i
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
140/189
Lexical Analysis FST
+
sprep pjatnum nom -procentn adj ? +aja fem + sg + nom skidk fem ojsg + instr
sprep pjatnum igen-procentn adj ? +oj fem + sg + instr skidk fem ojsg + instr 2 : 0
sprep pjatnum juinstr -procent noun +ami pl + instr skidk femojsg + instr 4 : 0
...
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 140
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
141/189
Percentage Expansion: Continued
s 5% s k i d k o i
+
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
142/189
s pjati gen-procentn adjojsg + instr skidkoj
L P
+
s # PiT"!p r@c"Entn&y # sK"!tk&y
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 142
Phrasing Prediction
Problem : predict intonational phrase boundaries in longunpunctuated utterences:
h l ld l k k d d
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
143/189
For his part, Clinton told reporters in Little Rock, Ark., on Wednesday k that the pact can be a good thing for America k if we change our economic policy k to rebuild American industry here at home k and if we get the kind of guarantees we need on environmental and labor standards in Mexico k and a real plan k to help the people who willbe dislocated by it.
Bell Labs synthesizer uses a CART-based predictor trained on labeledcorpora (Wang & Hirschberg 1992).
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 143
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
144/189
Phrasing Prediction: Sample Tree
punc:NO
punc:YES
69923/82958no
j3f:CC,CS,EX,FORIN,IN,ININ,ONIN,TO,TOIN
j3f:AT,CD,PART,UH,NA
69826/74358no
no
no
j3n:NN,NP
j3n:NNS,PN,NA
8503/8600
yes
yes yes
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
145/189
j3f:CC,CS
j3f:EX,FORIN,IN,ININ,ONIN,TO,TOIN
9968/13032
syls:7.5
1502/2985
no
j1v:HV,MD,SAIDVBD,VB,VBD,VBG
j1v:HVD,VBN,VBZ,NA
361/452no
3221/29
yes
33353/423
no
j4n:NN,NNS,NP
j4n:PN,NA
1392/2533
yes
npdist:1.875
510/792no
68472/667
no
6987/125
yes
j2n:NN,NNS,NP
j2n:PN,NA
1110/1741
yes
70820/1042
yes
j3f:CC
j3f:CS
409/699no
j1f:AT,CS,IN,TO
j1f:CC,CD,FORIN,ININ,ONIN,TOIN,NA
167/295
yes
28440/60
no
285147/235
yes
raj4:CL,DEACC
raj4:ACC
281/404no
286275/379
no
28719/25
yes
98466/10047
no
nploc:SUCC,SINGLE
nploc:PRE,W/IN,OTHER
59858/61326
j2n:NN,NNS,NP
j2n:PN,NA
8536/9549
no
j3w:WP$,WDT,WPS,WRB
j3w:NA
6111/7106no
ssylsp:4.5
219/418
yes
8048/53
no
j1f:AT,CS,FORIN,TOIN
j1f:CC,CD,IN,ININ,NA
214/365
yes
16256/88
no
163182/277
yes
415912/6688
no
212425/2443
no
1151322/51777
no
j4f:0,ONIN
j4f:AT,CC,CD,FORIN,IN,ININ,TO,TOIN,NA
777/848
y
1230/39
no
13768/809
yes
77726/7752
y
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 145
Phrasing Prediction: Results
Results for multi-speaker read speech:
major boundaries only: 91.2%
collapsed major/minor phrases: 88.4%
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
146/189
3-way distinction between major, minor and null boundary:81.9%
Results for spontaneous speech:
major boundaries only: 88.2%
collapsed major/minor phrases: 84.4%
3-way distinction between major, minor and null boundary:78.9%
Results for 85K words of hand-annotated text, cross-validated ontraining data: 95.4% .
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 146
Tree-Based Modeling: Prosodic Phrase Prediction
[1] dpunc:3.5
920/1800no
[2] lpos:N,V,A,Adv,Dlpos:P
620/1080no
rpos:N,Arpos:V,Adv,D,P
420/720yes
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
147/189
[3] rpos:N,Arpos:V,Adv,D,P
495/900no
[4] dpunc:2.5
190/300no
16133/200
no
lpos:Nlpos:V,A,Adv,D
57/100
no
3413/20
yes
3550/80
no
dpunc:1.5
305/600no
lpos:V,A,Advlpos:N,D
117/200
no
3678/120
no
lpos:Nlpos:D
41/80yes
7423/40yes
7522/40
no
rpos:V,Adv,Drpos:P
212/400
yes
lpos:V,Alpos:N,Adv,D
152/300yes
rpos:Vrpos:Adv,D
64/120no
15224/40yes
15348/80
no
rpos:Vrpos:Adv,D
96/180yes
15433/60
no
15569/120
yes
3960/100
yes
5125/180
no
lpos:Alpos:N,V,Adv,D,P
134/240no
1230/40
no
lpos:V,Advlpos:N,D,P
104/200no
2647/80
no
rpos:Nrpos:A
63/120
yes
5435/60
no
5538/60yes
7314/480
yes
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 147
The Tree Compilation Algorithm
(Sproat & Riley, 1996)
Each leaf node corresponds to single rule dening a constrained weighted mapping for the input symbol associated with the tree
Decisions at each node are stateable as regular expressions restricting the left
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
148/189
Decisions at each node are stateable as regular expressions restricting the left
or right context of the rule(s) dominated by the branch The full left/right context of the rule at a leaf node are derived by intersecting
the expressions traversed between the root and leaf node
The transducer for the entire tree represents the conjunction of all theconstraints expressed at the leaf nodes; it is derived by intersecting togetherthe set of WFSTs corresponding to each of the leaves
Note that intersection is dened for transducers that express same-lengthrelations
The alphabet is dened to be an alphabet of all correspondence pairs thatwere determined empirically to be possible
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 148
Interpretation of Tree as a Ruleset
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
149/189
Node 16
1 ( ( I ! I ! # ! I ! # ! # ! ) ) \ 3 N A
2 ( ( N V A A d v D ) ) \
4 ( ( I ! I ! # ! ) )
# ) ( I 1 : 09 #0 : 41 ) = I ( ! # ) ? ( N V A A d v D ) N A
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 149
Summary of Compilation Algorithm
Each rule represents a weighted two-level surface coercion rule
R u l e
L
= C o m p i l e (
T
!
L
=
\
p 2 P
L
p
\
p 2 P
L
p
)
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
150/189
Each tree/forest represents a set of simultaneous weighted two-levelsurface coercion rules
R u l e
T
=
\
L 2 T
R u l e
L
R u l e
F
=
\
T 2 F
R u l e
T
BestPath(,D#N#V#Adv#D#A#N Tree) ) ,D#N#V#Adv , D#A#N 2 : 76
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 150
Lexical Ambiguity Resolution
Word sense disambiguation :
She handed down a harsh sentence . peine
This sentence is ungrammatical. phrase
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
151/189
Homograph disambiguation :He plays bass . /be / s/
This lake contains a lot of bass . /bs/
Diacritic restoration :appeler lautre cote de latlantique c ote side
Cote dAzur c ote coast
(Yarowsky, 1992; Yarowsky 1996; Sproat, Hirschberg & Yarowsky, 1992;Hearst 1991)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 151
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
152/189
Homograph Disambiguation 2
Sort by A b s ( L o g ( P r ( P r o n 1 j C o l l o c a t i o n i ) P r ( P r o n 2 j C o l l o c a t i o n i )
) )
Decision List for lead
Logprob Evidence Pronunciation
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
153/189
11.40 follow/V + lead ) lid
11.20 zinc $ lead ) l d
11.10 lead level/N ) l d
10.66 of lead in ) l d
10.59 the lead in ) lid
10.51 lead role ) lid
10.35 copper $ lead ) l d
10.28 lead time ) lid
10.16 lead poisoning ) l d
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 153
Homograph Disambiguation 3: Pruning
Redundancy by subsumption
Evidence lid l d Logprob
lead level/N 219 0 11.10
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
154/189
lead levels 167 0 10.66
lead level 52 0 8.93
Redundancy by association
Evidence t
t i
tear gas 0 1671
tear $ police 0 286
tear $
riot 0 78tear $ protesters 0 71
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 154
Homograph Disambiguation 4: Use
Choose single best piece of matching evidence.
Decision List for lead
Logprob Evidence Pronunciation
11 40 f ll /V lead lid
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
155/189
11.40 follow/V + lead ) lid
11.20 zinc $ lead ) l d
11.10 lead level/N ) l d
10.66 of lead in ) l d
10.59 the lead in ) lid
10.51 lead role ) lid
10.35 copper $ lead ) l d
10.28 lead time ) lid
10.16 lead poisoning ) l d
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 155
Homograph Disambiguation: EvaluationWord Pron1 Pron2 Sample Size Prior Performance
lives la i vz l i vz 33186 .69 .98
wound wa Y nd wund 4483 .55 .98
Nice na i s nis 573 .56 .94
Begin b g n be g n 1143 75 97
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
156/189
Begin b i q g i n be i g i n 1143 .75 .97
Chi t S i ka i 1288 .53 .98
Colon ko Y q lo Y n q ko Y l n 1984 .69 .98
lead (N) lid l d 12165 .66 .98
tear (N) t t i 2271 .88 .97
axes (N) q ksiz q ks i z 1344 .72 .96
IV a i vi f A M W 1442 .76 .98
Jan d c n j n 1327 .90 .98
routed M ut i d M a Y t i d 589 .60 .94
bass be i s bs 1865 .57 .99
TOTAL 63660 .67 .97
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 156
Decision Lists: Summary
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
157/189
Efcient and exible use of data.
Easy to interpret and modify.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 157
Decision Lists as WFSTs
The lead example
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
158/189
Construct homograph taggers H
0, H
1 : : :
that nd and tag instancesof a homograph set in a lexical analysis. For example, H 1 is:
0
:
1
##:##
:2
l:l
:3
e:e:
4
a:a:
5
d:d:
6
1:1
: 7nn:nn8
:H1:
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 158
Decision Lists as WFSTs
Construct an environmental classier consisting of a pair of transducers C 1 and C 2 ,where
C 1 optionally rewrites any symbol except the word boundary or the homograph tagsH0, H1 : : : , as a single dummy symbol
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
159/189
C
2 classies contextual evidence from the decision list according to its type, andassigns a cost equal to the position of the evidence in the list; and otherwise passes, word boundary and H0, H1 : : : through:
## follow vb ## ! ## V0 ## < 1 >
## zinc nn ## ! ## C1 ## < 2 >
## level(s?) nn ## ! ## R1 ## < 3 >
## of pp ## ! ## [1 ## < 2 >
## in pp ## !
## 1] ##
...
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 159
Decision Lists as WFSTs
Construct a disambiguator D from a set of optional rules of the form:H0 ! 3 / V0
H1 ! 3 / C1
H1 ! 3 / C1
-
8/8/2019 Algorithms for Speech Recognition and Language Processing
160/189
H0 ! 3 / ## R0
H1 ! 3 / ## R1
H0 ! 3 / [0 #