algorithms for speech recognition and language processing

Upload: nbinh94

Post on 09-Apr-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    1/189

    c m p - l g / 9 6 0 8 0 1 8 v 2 1 7 S e p 1 9 9 6

    Algorithms for Speech Recognition andLanguage Processing

    Mehryar Mohri Michael Riley Richard Sproat

    AT&T Laboratories AT&T Laboratories Bell Laboratories

    [email protected] [email protected] [email protected]

    Joint work with Emerald Chung, Donald Hindle, Andrej Ljolje, Fernando Pereira

    Tutorial presented at COLING96, August 3rd, 1996 .

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    2/189

    Introduction (1)

    Text and speech processing: hard problems

    Theory of automata

    Appropriate level of abstraction

    Well-dened algorithmic problems

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing Introduction 2

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    3/189

    Introduction (2)

    Three Sections:

    Algorithms for text and speech processing (2h)

    Speech recognition (2h)

    Finite-state methods for language processing (2h)

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing Introduction 3

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    4/189

    PART IAlgorithms for Text and Speech Processing

    Mehryar MohriAT&T Laboratories

    [email protected]

    August 3rd, 1996

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 4

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    5/189

    Denitions: nite automata (1) A = ( ; Q ; ; I ; F )

    Alphabet ,

    Finite set of states Q ,

    Transition function : Q ! 2 Q ,

    I Q

    set of initial states, F Q set of nal states.

    A recognizes L ( A ) = f w 2 : ( I ; w ) \ F 6 = ; g

    (Hopcroft and Ullman, 1979; Perrin, 1990)Theorem 1 (Kleene, 1965). A set is regular (or rational) iff it can berecognized by a nite automaton.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 5

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    6/189

    Denitions: nite automata (2)

    0

    ab

    1a 2b 3a

    0

    b1a

    a

    2

    b

    b

    3a

    a

    b

    Figure 1: L ( A ) = a b a .

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 6

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    7/189

    Denitions: weighted automata (1)

    A = ( ; Q ; ; ; ; ; I ; F )

    ( ; Q ; ; I ; F ) is an automaton,

    Initial output function ,

    Output function

    : Q

    Q ! K

    , Final output function ,

    Function f : ! ( K ; + ; ) associated with A : 8 u 2 D o m ( f ) ; f ( u ) =

    X

    ( i ; q ) 2 I ( ( i ; u ) \ F )

    ( ( i ) ( i ; u ; q ) ( q ) ) .

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 7

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    8/189

    Denitions: weighted automata (2)

    0 /4 1 /0a/0

    3 /0a/2

    2 /0

    b/1

    b/0a/0

    Figure 2: Index of t = a b a .

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 8

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    9/189

    Denitions: rational power series

    Power series : functions mapping to a semiring ( K ; + ; )

    Notation: S =X

    w 2

    ( S ; w ) w , ( S ; w ) : coefcients

    Support: s u p p ( S ) = f w 2

    : ( S ; w ) 6 = 0 g

    Sum: ( S + T ; w ) = ( S ; w ) + ( T ; w )

    Star: S =X

    n 0

    S

    n

    Product: ( S T ; w ) =X

    u v = w 2

    ( S ; u ) ( T ; v )

    Rational power series : closure under rational operations of polynomials

    (polynomial power series) (Salomaa and Soittola, 1978; Berstel andReutenauer, 1988)

    Theorem 2 (Sch utzenberger, 1961). A power series is rational iff it can berepresented by a weighted nite automaton.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 9

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    10/189

    Denitions: transducers (1) T = ( ; ; Q ; ; ; I ; F )

    Finite alphabets and ,

    Finite set of states Q ,

    Transition function : Q ! 2 Q ,

    Output function

    : Q

    Q !

    , I Q set of initial states,

    F Q set of nal states.

    T denes a relation: R ( T ) = f ( u ; v ) 2 ( ) 2 : v 2

    q 2 ( ( I ; u ) \ F )

    ( I ; u ; q ) g

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 10

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    11/189

    Denitions: transducers (2)

    0

    a:a

    1a:a

    b:b

    3a:b

    2

    b:a

    b:ab:a

    a:a

    Figure 3: Fibonacci normalizer ( a b b ! b a a ] b a a a b b ] ).

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 11

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    12/189

    Denitions: weighted transducers

    0

    a:b/0b:a/1

    1a:b/0

    a:b/1

    2b:c/1 3 /0a:b/0

    Figure 4: Example, a a b a ! ( b b c b ; ( 0 0 1 0 ) ( 0 1 1 0 ) ) .

    ( min ; + ) : a a b a ! min f 1 ; 2 g = 1

    ( + ; ) : a a b a ! 0 + 0 = 0

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 12

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    13/189

    Composition: Motivation (1)

    Construction of complex sets or functions from more elementary ones

    Modular (modules, distinct linguistic descriptions) On-the-y expansion

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 13

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    14/189

    Composition: Motivation (2)

    lexicalanalyzer

    syntaxanalyzer

    semanticanalyzer

    intermediate codegenerator

    codeoptimizer

    codegenerator

    source program

    target programFigure 5: Phases of a compiler (Aho et al. , 1986).

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 14

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    15/189

    Composition: Motivation (3)

    Spellchecker

    Inflected forms

    Index

    Source text

    Set of positions

    Figure 6: Complex indexation.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 15

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    16/189

    Composition: Example (1)

    0 1a:a 2b: 3c: 4d:d

    0 1a:d 2:e 3d:a

    (0,0) (1,1)a:d (2,2)b:e (3,2)c: (4,3)d:a

    Figure 7: Composition of transducers.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 16

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    17/189

    Composition: Example (2)

    0 1a:a/3 2b: /1 3c: /4 4d:d/2

    0 1a:d/5 2:e /7 3d:a/6

    (0,0) (1,1)a:d/15 (2,2)b:e/7 (3,2)c: /4 (4,3)d:a/12

    Figure 8: Composition of weighted transducers ( + ; ).

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 17

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    18/189

    Composition: Algorithm (1)

    Construction of pairs of states

    Match: q 1 a : b = w 1

    ! q

    0

    1 and q 2 b : c = w 2

    ! q

    0

    2

    Result: ( q 1 ; q 2 ) a : c = ( w 1 w 2 )

    ! ( q

    0

    1 ; q 0

    2 )

    Elimination of -paths redundancy: lter

    Complexity: quadratic

    On-the-y implementation

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 18

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    19/189

    Composition: Algorithm (2)

    a:a b: c: d:d

    a:d :e d:a

    a:d 1 :e d:a

    2 : 2 : 2 : 2 :

    : 1

    a:a b: 2 c: 2 d:d

    : 1 : 1 : 1 : 1

    0 1 2 3 4

    0 1 2 3

    0 1 2 3

    0 1 2 3 4

    (a)

    (b)

    (c)

    (d)

    A

    B

    A'

    B'

    Figure 9: Composition of weighted transducers with -transitions.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 19

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    20/189

    Composition: Algorithm (3)

    0,0 1,1 1,2

    2,1 2,2

    3,1 3,2

    4,3

    a:d :e

    b:

    c:

    b:

    c:

    :e

    :ed:a

    b:e(x:x) (1:1 )

    (1:1 )

    (1:1 )

    (2:2 )(2:2 )

    (2:2 ) (2:2 )

    (x:x)

    (2:1)

    Figure 10: Redundancy of -paths.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 20

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    21/189

    Composition: Algorithm (4)

    0

    x:x2:1

    11:1

    2

    2:2

    x:x

    1:1

    x:x

    2:2

    Figure 11: Filter for efcient composition.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 21

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    22/189

    Composition: Theory

    Transductions (Elgot and Mezei, 1965; Eilenberg, 1974 1976;Berstel, 1979).

    Theorem 3 Let 1 and 2 be two (weighted) (automata +transducers), then ( 1 2 ) is a (weighted) (automaton + transducer).

    Efcient composition of weighted transducers (Mohri, Pereira, andRiley, 1996).

    Works with any semiring

    Intersection: composition of automata (weighted).

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 22

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    23/189

    Intersection: Example

    0

    b1a

    a

    2

    b

    b

    3a

    a

    b

    0

    1b 3

    a

    c

    2b

    c

    4ba 5a

    (0,0)

    (0,1)b

    (1,3)a

    (0,2)b

    (2,4)ba

    (3,5)a

    Figure 12: Intersection of automata.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 23

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    24/189

    Union: Example

    0

    b/11a/3

    a/5

    2

    b/2

    b/6

    3 /0a/4

    a/3

    b/7

    0

    1b/5 3

    a/3

    c/0

    2b/2

    c/1

    4b/3a/6 5 /0a/4

    0

    1b/5 3

    a/3

    c/0

    2b/2

    c/1

    4b/3a/6 5 /0a/4

    6

    b/1

    7a/3

    a/5

    8

    b/2

    b/6

    9 /0

    a/4

    a/3

    b/7

    10

    /0

    /0

    Figure 13: Union of weighted automata (min ; + ).

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 24

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    25/189

    Determinization: Motivation (1)

    Efciency of use (time)

    Elimination of redundancy No loss of information ( 6 = pruning)

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 25

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    26/189

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    27/189

    Determinization: Motivation (3)

    0 1which/69.92flights/53.1

    3flight/53.2

    4leave/64.6

    5leaves/62.3

    6leave/63.6

    7

    leaves/67.6

    8 /0

    Detroit/103

    Detroit/105

    Detroit/105

    Detroit/101

    Figure 15: Determinized language model (9 states, 11 transitions, 4 paths).

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 27

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    28/189

    Determinization: Example (1)

    t4

    0

    2a

    b

    1

    a

    b

    3

    b

    b

    b

    b

    {0} {1,2}a

    b{3}b

    Figure 16: Determinization of automata.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 28

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    29/189

    Determinization: Example (2)

    t4

    0

    2a/1

    b/4

    1 /0

    a/3

    b/1

    3 /0

    b/1

    b/3

    b/3

    b/5

    {(0,0)}

    {(1,2),(2,0)}/2a/1

    {(1,0),(2,3)}/0

    b/1{(3,0)}/0

    b/1

    b/3

    Figure 17: Determinization of weighted automata (min ; + ).

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 29

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    30/189

    Determinization: Example (3)

    0

    2

    a:b

    b:a

    1

    a:ba

    b:aa

    3

    c:c

    d:

    {(0, )} {(1,a),(2, )}a:b

    b:a

    a

    {(3, )}

    c:c

    d:a

    Figure 18: Determinization of transducers.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 30

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    31/189

    Determinization: Example (4)

    0

    2

    a:b/3

    b:a/2

    1/0

    a:ba/4

    b:aa/3

    3/0

    c:c/5

    d:/4

    {(0,e,0)} {(1,a,1),(2, ,0)}a:b/3b:a/2

    a/1

    {(3,,0)}/0

    c:c/5

    d:a/5

    Figure 19: Determinization of weighted transducers (min ; + ).

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 31

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    32/189

    Determinization: Algorithm (1)

    Generalization of the classical algorithm for automata Powerset construction

    Subsets made of (state, weight) or (state, string, weight)

    Applies to subsequentiable weighted automata and transducers Time and space complexity: exponential (polynomial w.r.t. size of

    the result)

    On-the-y implementation

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 32

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    33/189

    Determinization: Algorithm (2)Conditions of applications

    Twin states: q and q 0 are twin states iff:

    If: they can be reached from the initial states by the same inputstring u

    Then: cycles at q and q 0 with the same input string v have the

    same output value Theorem 4 (Choffrut, 1978; Mohri, 1996a) Let be an

    unambiguous weighted automaton (transducer, weighted transducer),then can be determinized iff it has the twin property.

    Theorem 5 (Mohri, 1996a) The twin property can be tested in polynomial time.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 33

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    34/189

    Determinization: Theory Determinization of automata

    General case (Aho, Sethi, and Ullman, 1986)

    Specic case of

    : failure functions (Mohri, 1995) Determinization of transducers, weighted automata, and weighted

    transducers

    General description, theory and analysis (Mohri, 1996a; Mohri,1996b)

    Conditions of application and test algorithm

    Acyclic weighted transducers or transducers admitdeterminization

    Can be used with other semirings (ex: ( R ; + ; ) )

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 34

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    35/189

    Local determinization: Motivation

    Time efciency

    Reduction of redundancy

    Control of the resulting size (exibility) Equivalent function (or equal set)

    No loss of information

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 35

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    36/189

    Local determinization: Example

    0

    1a:a/3

    b:a/5

    2a:b/4

    b:b/6

    c:a/2

    3a:a/5

    b:a/74

    c:a/3

    5b:c/3

    a:b/3

    a:c/2

    a:a/3

    c:b/2

    0

    1

    {(1,a,0),(2,b,1),(3,a,2)}

    a: /3

    b:/53

    {(2,,0)}

    c:a/2

    :b/1:

    2

    {(1,,0)}

    :a/0

    4

    {(3,,0)}

    :a/2 5

    a:b/3

    6

    a:c/2

    c:a/3

    b:c/3

    a:a/3

    c:b/2

    Figure 20: Local determinization of weighted transducers (min ; + ).

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 36

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    37/189

    Local determinization: Algorithm

    Predicate, ex: ( P

    ) ( o u t d e g r e e ( q ) > k

    ) k : threshold parameter

    Local: D o m ( d e t ) = f q : P ( q ) g

    Determinization only for q 2 D o m ( d e t )

    On-the-y implementation

    Complexity O ( j D o m ( d e t ) j max q 2 Q

    ( o u t d e g r e e ( q ) ) )

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 37

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    38/189

    Local determinization: theory

    Various choices of predicate (constraint: local)

    Denition of parameters

    Applies to all automata, weighted automata, transducers, andweighted transducers

    Can be used with other semirings (ex: ( R ; + ; ) )

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 38

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    39/189

    Minimization: Motivation

    Space efciency

    Equivalent function (or equal set) No loss of information ( 6 = pruning)

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 39

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    40/189

    Minimization: Motivation (2)

    0 1which/69.92flights/53.1

    3flight/53.2

    4leave/64.6

    5leaves/62.3

    6leave/63.6

    7

    leaves/67.6

    8 /0

    Detroit/103

    Detroit/105

    Detroit/105

    Detroit/101

    Figure 21: Determinized language model .

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 40

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    41/189

    Minimization: Motivation (3)

    0 1which/2912flights/0

    3flight/1.34

    4

    leave/0.0498

    leaves/0

    leave/0

    leaves/0.132

    5 /0Detroit/0

    Figure 22: Minimized language model .

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 41

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    42/189

    Minimization: Example (1)

    t96

    01a

    3

    b

    a

    2b

    4

    b

    c

    5

    a

    bc

    b

    a

    t97

    0

    1a

    3

    b

    a

    2

    b

    b

    c 4ab

    Figure 23: Minimization of automata.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 42

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    43/189

    Minimization: Example (2)

    0 1a:0

    b:1

    d:0

    2a:3

    4

    b:2

    3

    c:2

    5

    c:1

    d:4

    6

    e:3c:1

    7e:1d:3 e:2

    0 1a:6b:7

    d:0

    2a:3

    4

    b:0

    3

    c:0

    5

    c:0

    d:6 6e:0

    c:1

    7e:0

    d:6 e:0

    0 1a:6

    b:7

    d:0

    2

    a:3

    b:0

    3

    c:0

    d:64e:0

    c:1

    5e:0

    Figure 24: Minimization of weighted automata (min ; + ).

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 43

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    44/189

    Minimization: Example (3)

    0

    1a:A

    4

    b:C

    2b:B

    5b:C

    3

    c:C

    d:D

    a:DB

    6e:D 7f:BC

    c:D

    0

    1a:ABCDB

    4b:CCDDB

    2b:

    5b:

    3

    c:

    d:CDB

    a:DB

    6e:C 7f:

    c:

    0 1a:ABCDB

    b:CCDDB

    2b:

    3

    c:

    d:CDB

    a:DB

    5e:C 6f:

    Figure 25: Minimization of transducers.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 44

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    45/189

    Minimization: Example (4)

    0

    1a:A/0

    4

    b:C/2

    2b:B/5

    5b:C/2

    3

    c:C/3

    d:D/1

    a:DB/2

    6e:D/1 7/0f:BC/6

    c:D/4

    0

    1a:ABCDB/15

    4b:CCDDB/15

    2b: /0

    5b: /0

    3

    c: /0

    d:CDB/9

    a:DB/2

    6e:C/0 7/0f:/0

    c: /0

    0 1a:ABCDB/15

    b:CCDDB/15

    2b:/0

    3

    c: /0

    d:CDB/9

    a:DB/2

    5e:C/0 6/0f: /0

    Figure 26: Minimization of weighted transducers (min ; + ).

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 45

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    46/189

    Minimization: Algorithm (1)

    Two steps Pushing or extraction of strings or weights towards initial state

    Classical minimization of automata, (input,ouput) considered as asingle label

    Algorithm for the rst step

    Transducers: specic algorithm

    Weighted automata: shortest-paths algorithms

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 46

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    47/189

    Minimization: Algorithm (2) Complexity

    E: set of transitions

    S: sum of the lengths of output strings the longest of the longest common prexes of the output paths

    leaving each state

    Type General AcyclicAutomata O ( j E j log ( j Q j ) ) O ( j Q j + j E j )

    Weighted automata O ( j E j log ( j Q j ) ) O ( j Q j + j E j )

    Transducers O ( j Q j + j E j O ( S + j E j + j Q j +

    ( log j Q j + j P m a x

    j ) ) ( j E j ( j Q j j F j ) )

    j P

    m a x

    j )

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 47

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    48/189

    Minimization: Theory

    Minimization of automata (Aho, Hopcroft, and Ullman, 1974; Revuz,1991)

    Minimization of transducers (Mohri, 1994)

    Minimization of weighted automata (Mohri, 1996a)

    Minimal number of transitions Test of equivalence

    Standardization of power series (Sch utzenberger, 1961)

    Works only with elds Creates too many transitions

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 48

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    49/189

    Conclusion (1)

    Theory

    Rational power series

    Weighted automata and transducers

    Algorithms

    General (various semirings)

    Efciency (used in practice, large sizes)

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 49

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    50/189

    Conclusion (2)

    Applications

    Text processing(spelling checkers, pattern-matching, indexation, OCR)

    Language processing

    (morphology, phonology, syntax, language modeling) Speech processing (speech recognition, text-to-speech synthesis)

    Computational biology (matching with errors)

    Many other applications

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART I 50

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    51/189

    PART IISpeech Recognition

    Michael RileyAT&T Laboratories

    [email protected]

    August 3rd, 1996

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 51

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    52/189

    Overview The speech recognition problem

    Acoustic, lexical and grammatical models

    Finite-state automata in speech recognition

    Search in nite-state automata

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 52

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    53/189

    Speech Recognition

    Given an utterance, nd its most likely written transcription.

    Fundamental ideas:

    Utterances are built from sequences of units

    Acoustic correlates of a unit are affected by surrounding units

    Units combine into units at a higher level phones ! syllables !words

    Relationships between levels can be modeled by weighted graphs we use weighted nite-state transducers

    Recognition: nd the best path in a suitable product graph

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 53

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    54/189

    Levels of Speech Representation

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 54

    d

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    55/189

    Maximum A Posteriori Decoding

    Overall analysis [4, 57]:

    Acoustic observations: parameter vectors derived by local spectralanalysis of the speech waveform at regular (e.g. 10msec) intervals

    Observation sequence o

    Transcriptions w

    Probability P ( o j w ) of observing o when w is uttered

    Maximum a posteriori decoding :

    w = argmaxw

    P ( w j o ) = argmaxw

    P ( o j w ) P ( w ) P ( o )

    = argmaxw P ( o j w ) | { z }

    generativemodel

    P ( w ) | { z }

    languagemodel

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 55

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    56/189

    Generative Models of Speech

    Typical decomposition of P ( o j w ) into conditionally-independentmappings between levels:

    Acoustic model P (

    o j

    p )

    : phone sequences !

    observation sequences.Detailed model:

    P ( o j d ) : distributions ! observation vectors symbolic ! quantitative

    P ( d j m ) : context-dependent phone models !distribution sequences

    P ( m j p ) : phone sequences ! model sequences

    Pronunciation model P ( p j w ) : word sequences ! phone sequences

    Language model P ( w ) : word sequences

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 56

    R iti C d G l F

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    57/189

    Recognition Cascades: General Form

    Multistage cascade:

    o = sk w = s0s1sk 1stage k stage 1

    Find s0 maximizing

    P ( s0

    ; s k

    ) = P ( s k

    j s0

    ) P ( s0

    ) = P ( s0

    )

    X

    s1 ; : : : ; s k 1

    Y

    1 j k

    P ( s j

    j s j

    1 )

    Viterbi approximation:

    Cost (

    s0 ;

    s k ) =

    Cost (

    s k j

    s0 ) +

    Cost (

    s0 )

    Cost ( s k

    j s0 ) min s1 ; : : : ; s k 1P

    1 j k Cost ( s j j s j 1 )

    where Cost ( : : : ) = log P ( : : : ) .

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 57

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    58/189

    Speech Recognition Problems Modeling: how to describe accurately the relations between levels )

    modeling errors

    Search: how to nd the best interpretation of the observationsaccording to the given models ) search errors

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 58

    Acoustic Modeling Feature Selection I

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    59/189

    Acoustic Modeling Feature Selection I

    Short-time spectral analysis:

    log

    Z

    g ( ) x ( t + ) e

    i 2 f d

    Short-time (25 msec. Hamming window) spectrum of /ae/ Hz. vs. Db.

    Scale selection: Cepstral smoothing

    Parameter sampling (13 parameters)

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 59

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    60/189

    Acoustic Modeling Feature Selection II [40, 38]

    Renements

    Time derivatives 1st and 2nd order non-Fourier analysis (e.g., Mel scale)

    speaker/channel adaptation

    mean cepstral subtraction vocal tract normalization linear transformations

    Result: 39 dimensional feature vector (13 cepstra, 13 delta cepstra,13 delta-delta cepstra) every 10 milliseconds

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 60

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    61/189

    Acoustic Modeling Stochastic Distributions [4, 61, 39, 5]

    Vector quantization nd codebook of prototypes

    Full covariance multivariate Gaussians:

    P y ] =1

    (

    2 )

    N =

    2 j S j 1

    =

    2

    e

    12 ( y

    T

    T

    ) S 1 ( y )

    Diagonal covariance Gaussian mixtures

    Semi-continuous, tied mixtures

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 61

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    62/189

    Acoustic Modeling Units and Training [61, 36]

    Units

    Phonetic ( sub-word ) units e.g., cat > /k ae t/ Context-dependent units a e

    k ; t

    Multiple distributions ( states ) per phone left, middle, right

    Training

    Given a segmentation , training straight-forward

    Obtain segmentation by transcription

    Iterate until convergence

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 62

    Generating Lexicons Two Steps

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    63/189

    Generating Lexicons Two Steps

    Orthography ! Phonemeshad ! /hh ae d/ your ! /y uw r/

    complex, context-independent mapping

    usually small number of alternatives

    determined by spelling constraints; lexical facts

    large online dictionaries available

    Phonemes ! Phones /hh ae d y uw r/ ! [hh ae dcl jh axr] (60% prob) /hh ae d y uw r/ ! [hh ae dcl d y axr] (40% prob)

    complex, context-dependent mapping many possible alternatives

    determined by phonological and phonetic constraints

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 63

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    64/189

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    65/189

    1. Decision Tree Splitting Rules

    Which split to take at a node?

    Candidate splits considered.

    Binary cuts : For continuous 1 x < 1 , consider splits of

    form: x k vs : x > k ; 8 k :

    Binary partitions : For categorical x 2 f 1 ; 2 ; : : : ; n g = X ,consider splits of form:

    x 2 A vs : x 2 X A ; 8 A X :

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 65

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    66/189

    2. Decision Tree Stopping Rules

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    67/189

    pp g

    When to declare a node terminal? Strategy ( Cost-Complexity pruning ):

    1. Grow over-large tree.

    2. Form sequence of subtrees, T 0 ; : : : ; T n ranging from full tree to just the root node.

    3. Estimate honest error rate for each subtree.

    4. Choose tree size with mininum honest error rate.

    To form sequence of subtrees, vary from 0 (for full tree) to 1 (for just root node) in:

    min T

    R ( T ) + j T j

    :

    To estimate honest error rate, test on data different from trainingdata, e.g., grow tree on 9 = 10 of available data and test on 1 = 10 of datarepeating 10 times and averaging ( cross-validation) .

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 67

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    68/189

    End of Declarative Sentence Prediction: PruningSequence

    +++++++++++++++++++++++++++++++++++++++++++++

    ++++

    +

    +

    +

    +

    + = raw, o = cross-validated# of terminal nodes

    e r r o r r a

    t e

    0 20 40 60 80 100

    0 . 0

    0

    . 0 0 5

    0 . 0

    1 5

    0 . 0

    2 5

    ooooooooooooooooooooooooooooooo

    oo

    o

    o

    o

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 68

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    69/189

    3. Decision Tree Node Assignment

    Which class/value to assign to a terminal node?

    Plurality vote : Choose most frequent class at that node forclassication; choose mean value for regression.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 69

    End-of-Declarative-Sentence Prediction: Features [65]

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    70/189

    Prob[word with . occurs at end of sentence]

    Prob[word after . occurs at beginning of sentence]

    Length of word with .

    Length of word after .

    Case of word with .: Upper, Lower, Cap, Numbers Case of word after .: Upper, Lower, Cap, Numbers

    Punctuation after . (if any)

    Abbreviation class of word with .: e.g., month name,unit-of-measure, title, address name, etc.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 70

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    71/189

    End of Declarative Sentence?

    bprob:27.29

    48294/52895

    yes

    eprob:1.045

    5539/10020

    yes

    3289/3547

    no

    next:cap,upcase+.next:n/a,lcase,lcase+.,upcase,num

    5281/6473

    yes

    type:n/atype:addr,com,group,state,title,unit

    5156/5435

    yes

    5137/5283

    yes

    133/152

    no

    913/1038

    no

    42755/42875

    yes

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 71

    Phoneme-to-Phone Alignment

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    72/189

    PHONEME PHONE WORDp p purposeer erp pcl

    - pax ixs sae ax andn n

    d -r r respectih ixs sp pcl- peh ehk kclt t

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 72

    Phoneme-to-Phone Realization: Features [66, 10, 62]

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    73/189

    Phonemic Context:

    Phoneme to predict

    Three phonemes to left Three phonemes to right

    Stress (0, 1, 2)

    Lexical Position:

    Phoneme count from start of word

    Phoneme count from end of word

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 73

    Phoneme-to-Phone Realization: Prediction Example

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    74/189

    Tree splits for /t/ in your pretty red :

    PHONE COUNT SPLITix 182499n 87283 cm0: vstp,ustp,vfri,ufri,vaff,uaff,nas

    kcl+k 38942 cm0: vstp,ustp,vaff,uaff tcl+t 21852 cp0: alv,paltcl+t 11928 cm0: ustptcl+t 5918 vm1: mono,rvow,wdi,ydi

    dx 3639 cm-1: ustp,rho,n/adx 2454 rstr: n/a,no

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 74

    Phoneme-to-Phone Realization: Network Example

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    75/189

    Phonetic network for Don had your pretty... :

    PHONEME PHONE1 PHONE2 PHONE3 CONTEXTd 0.91 d

    aa 0.92 aan 0.98 nhh 0.74 hh 0.15 hvae 0.73 ae 0.19 ehd 0.51 dcl jh 0.37 dcl d

    y 0.90 y (if d ! dcl d)0.84 - 0.16 y (if d ! dcl jh)

    uw 0.48 axr 0.29 err 0.99 -p 0.99 pcl pr 0.99 rih 0.86 iht 0.73 dx 0.11 tcl tiy 0.90 iy

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 75

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    76/189

    Acoustic Model Context Selection [92, 39]

    Statistical regression trees used to predict contexts based ondistribution variance

    One tree per context-independent phone and state (left, middle, right)

    The trees were grown until the data criterion of 500 frames perdistribution was met

    Trees pruned using cost-complexity pruning and cross-validation toselect best contexts

    About 44000 context-dependent phone models

    About 16000 distributions

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 76

    N-Grams: Basics

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    77/189

    Chain Rule and Joint/Conditional Probabilities:

    P x 1 x 2 : : : x N ] = P x N j x 1 : : : x N 1 ] P x N 1 j x 1 : : : x N 2 ] : : : P x 2 j x 1 ] P x 1 ]

    where, e.g.,

    P x

    N

    j x 1 : : : x N 1 ] = P x 1 : : : x N ]

    P x 1 : : : x N 1 ]

    (FirstOrder) Markov assumption:

    P x

    k

    j x 1 : : : x k 1 ] = P x k j x k 1 ] = P x

    k 1 x k ]

    P x

    k 1 ]

    nthOrder Markov assumption:

    P x

    k

    j x 1 : : : x k 1 ] = P x k j x k n : : : x k 1 ] = P x

    k n

    : : : x

    k

    ]

    P x

    k n

    : : : x

    k 1 ]

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 77

    N-Grams: Maximum Likelihood Estimation

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    78/189

    Let N be total number of n-grams observed in a corpus and c ( x 1 : : : x n )be the number of times the n-gram x 1 : : : x n occurred. Then

    P x 1 : : : x n ] = c ( x 1 : : : x n )

    N

    is the maximum likelihood estimate of that n-gram probability.

    For conditional probabilities,

    P x

    n

    j x 1 : : : x n 1 ] = c ( x 1 : : : x n )

    c ( x 1 : : : x n 1 ) :

    is the maximum likelihood estimate.With this method, an n-gram that does not occur in the corpus is assignedzero probability.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 78

    N-Grams: Good-Turing-Katz Estimation [29, 16]

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    79/189

    Let n r

    be the number of n-grams that occurred r times. Then

    P x 1 : : : x n ] = c

    ( x 1 : : : x n )

    N

    is the Good-Turing estimate of that n-gram probability, where c

    ( x ) = ( c ( x ) + 1 ) n

    c ( x ) + 1 n

    c ( x )

    :

    For conditional probabilities,

    P x

    n

    j x 1 : : : x n 1 ] = c

    ( x 1 : : : x n )

    c ( x 1 : : : x n 1 ) ; c ( x 1 : : : x n ) > 0

    is Katzs extension of the Good-Turing estimate.

    With this method, an n-gram that does not occur in the corpus is assignedthe backoff probability P x

    n

    j x 1 : : : x n 1 ] = P x n j x 2 : : : x n 1 ] ; where is a normalizing constant.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 79

    Finite-State Modeling [57]

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    80/189

    Our view of recognition cascades : represent mappings between levels,observation sequences and language uniformly with weighted nite-statemachines:

    Probabilistic mapping P (

    x j

    y )

    : weighted nite-state transducer .Example word pronunciation transducer:

    d: /1 ey: /.4

    ae: /.6

    dx: /.8

    t: /.2

    ax:"data"/1

    Language model P ( w ) : weighted nite-state acceptor

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 80

    Example of Recognition Cascade

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    81/189

    phones wordsA D M

    observationsO

    Recognition from observations o by composition:

    Observations: O ( s ; s ) =

    8

    <

    :

    1 if s = o

    0 otherwise

    Acoustic-phone transducer: A ( a ; p ) = P ( a j p )

    Pronunciation dictionary: D ( p ; w ) = P ( p j w )

    Language model: M ( w ; w ) = P ( w )

    Recognition: w = argmaxw

    ( O A D M ) ( o ; w )

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 81

    Speech Models as Weighted Automata

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    82/189

    Quantized observations:on. . .t1 t2t0

    o1 o2 tn

    Phone model A

    : observations ! phones

    o i: /p01 (i) : /p2f

    ...

    ... ...

    ... ...

    o i: /p12 (i)

    o i: /p00 (i) o i: /p11 (i) o i: /p22 (i)

    s0 s1 s2

    Acoustic transducer: A =

    P

    A

    Word pronunciations D data : phones ! words

    d: /1 ey: /.4

    ae: /.6

    dx: /.8

    t: /.2

    ax:"data"/1

    Dictionary: D =

    P

    w

    D

    w

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 82

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    83/189

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    84/189

    Sample Pronunciation Dictionary D

    Dictionary with hostile , battle and bottle as a weighted transducer:

    0

    15

    -:-/2.466

    l:-/0.112

    14

    b:bottle/0.000

    17-:-/0.000

    16-:-/0.000

    1

    -:-/0.014

    2

    ax:-/2.607

    ay:-/1.616

    el:-/0.4313

    t:-/0.067

    4s:-/0.035

    5

    -:-/2.466

    l:-/0.112

    6-:-/0.014

    7

    ax:-/2.607

    el:-/0.164

    8t:-/2.113

    dx:-/0.2409 ae:-/0.057

    10-:-/2.466

    l:-/0.112

    11-:-/0.014

    12

    ax:-/2.607

    el:-/0.164

    13 t:-/2.113

    dx:-/0.240

    aa:-/0.055

    18-:hostile/2.943

    hh:hostile/0.134

    hv:hostile/2.635

    b:battle/0.000

    aa:-/0.055

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 84

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    85/189

    Sample Language Model M

    Simplied language model as a weighted acceptor:

    0 4-/2.374

    5

    -/3.961

    2battle/6.603

    hostile/9.394

    -/3.173

    battle/9.268

    1

    bottle/11.510

    -/1.882

    -/2.306

    -/1.102

    -/1.9133

    hostile/11.119

    -/3.537

    battle/10.896

    bottle/13.970

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 85

    Recognition by Composition

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    86/189

    From phones to words: compose dictionary with phone lattice toyield word lattice with combined acoustic and pronunciation costs:

    0 1hostile/-32.900 2battle/-26.825

    Applying language model: Compose word lattice with language

    model to obtain word lattice with combined acoustic, pronunciationand language model costs:

    0

    2hostile/-21.781

    1hostile/-19.407 3

    battle/-17.916

    battle/-15.250

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 86

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    87/189

    Context-Dependency Examples

    Context-dependent phone models: Maps from CI units to CD units.Example: a e = b d ! a e

    b ; d

    Context-dependent allophonic rules: Maps from baseforms to

    detailed phones. Example: t = V

    0

    V ! d x

    Difculty: Cross-word contexts where several words enter andleave a state in the grammar, substitution does not apply.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 87

    Context-Dependency Transducers

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    88/189

    Example triphonic context transducer for two symbols x and y .

    x.x x/x_x:x

    x.y

    x/x_y:x

    y.x

    y/x_x:yy.y

    y/x_y:y x/y_x:x

    x/y_y:x

    y/y_x:y

    y/y_y:y

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 88

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    89/189

    On-Demand Composition [69, 53]

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    90/189

    Create generalized state machine C for composition A B .

    C : s t a r t : = ( A : s t a r t ; B : s t a r t )

    C : f i n a l ( ( s

    1 ; s

    2 ) )

    : = A : f i n a l ( s

    1 ) ^ B : f i n a l ( s

    2 )

    C : a r c s ( ( s 1 ; s 2 ) ) : = M e r g e ( A : a r c s ( s 1 ) ; B : a r c s ( s 2 ) )

    Merged arcs dened as:

    ( l 1 ; l 3 ; x + y ; ( n s 1 ; n s 2 ) ) 2 M e r g e ( A : a r c s ( s 1 ) ; B : a r c s ( s 2 ) )

    iff

    ( l 1 ; l 2 ; x ; n s 1 ) 2 A : a r c s ( s 1 ) and ( l 2 ; l 3 ; y ; n s 2 ) 2 B : a r c s ( s 2 )

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 90

    State Caching

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    91/189

    Create generalized state machine B for input machine A .

    B : s t a r t : = A : s t a r t

    B : f i n a l ( s t a t e )

    : = A : f i n a l ( s t a t e )

    B : a r c s ( s t a t e ) : = A : a r c s ( s t a t e )

    Cache Disciplines:

    Expand each state of A exactly once, i.e. always save in cache(memoize).

    Cache, but forget old states using a least-recently used criterion.

    Use instructions (ref counts) from user (decoder) to save and forget.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 91

    O D d C iti R lt

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    92/189

    On Demand Composition ResultsATIS Task - class-based trigram grammar, full cross-word triphoniccontext-dependency.

    states arcs

    context 762 40386

    lexicon 3150 4816

    grammar 48758 359532

    full expansion 1 : 6 106 5 : 1 106

    For the same recognition accuracy as with a static, fully expanded

    network, on-demand composition expands just 1.6% of the total numberof arcs.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 92

    Determinization in Large Vocabulary Recognition

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    93/189

    Determinization in Large Vocabulary Recognition For large vocabularies, string lexicons are very non-deterministic

    Determinizing the lexicon solves this problem, but can introducenon-coassessible states during its composition with the grammar

    Alternate Solutions:

    Off-line compose, determinize, and minimize:

    L e x i c o n G r a m m a r

    Pre-tabulate non-coassessible states in the composition of:

    D e t ( L e x i c o n ) G r a m m a r

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 93

    Search in Recognition Cascades

    Reminder: Cost log probability

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    94/189

    Reminder: Cost log probability Example recognition problem: w = argmax

    w ( O A D M ) ( o ; w )

    Viterbi search : approximate w by the output word sequence for thelowest-cost path from the start state to a nal state in O A D M ignores summing over multiple paths with same output:

    ...:w 1

    ...:w i

    ...:w n...:

    ...:

    ...:

    ...:

    >

    O A D M

    Composition preserves acyclicity, O is acyclic ) acyclic searchgraph

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 94

    Single-source Shortest Path Algorithms [83]

    Meta algorithm:

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    95/189

    Meta-algorithm: Q f s 0 g ; 8 s ; C o s t ( s ) 1

    While Q not empty, s D e q u e u e ( Q )

    For each s 0 2 A d j s ] such that C o s t ( s 0 ) > C o s t ( s ) + c o s t ( s ; s 0 )

    C o s t ( s

    0

    ) C o s t ( s ) + c o s t ( s ; s

    0

    )

    E n q u e u e ( Q ; s )

    Specic algorithms:

    Name Queue type Cycles Neg. Weights Complexity

    acyclic topological no yes O ( j V j + j E j )

    Dijkstra best-rst yes no O ( j E j log j V j )

    Bellman-Ford FIFO yes yes O ( j V j j E j )

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 95

    The Search Problem

    Obvious rst approach : use an appropriate single-source

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    96/189

    Obvious rst approach : use an appropriate single sourceshortest-path algorithm

    Problem: impractical to visit all states, can we do better?

    Admissible methods: guarantee nding best path, but reordersearch to avoid exploring provably bad regions

    Non-admissible methods: may fail to nd best path, but may needto explore much less of the graph

    Current practical approaches:

    Heuristic cost functions

    Beam search

    Multipass search

    Rescoring

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 96

    Heuristic Cost Function A* Search [4 56 17]

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    97/189

    Heuristic Cost Function A Search [4, 56, 17] States in search ordered by

    cost-so-far ( s ) + lower-bound-to-complete ( s )

    With a tight bound, states not on good paths are not explored

    With a loose lower bound no better than Dijkstras algorithm

    Where to nd a tight bound? Full search of a composition of smaller automata (homomorphic

    automata with lower-bounding costs?)

    Non-admissible A* variants: use averaged estimate of cost-to-complete, not a lower-bound

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 97

    Beam Search [35]

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    98/189

    Beam Search [35] Only explore states with costs within a beam (threshold) of the cost

    of the best comparable state

    Non-admissible

    Comparable states states corresponding to (approximately) thesame observations

    Synchronous (Viterbi) search: explore composition states inchronological observation order

    Problem with synchronous beam search: too local, some observation

    subsequences are unreliable and may locally put the best overall pathoutside the beam

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 98

    Beam-Search Tradeoffs [68]

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    99/189

    Beam Search Tradeoffs [68]

    Word lattice: result of composing observation sequence, leveltransducers and language model.

    Beam Word latticeerror rate

    Median numberof edges

    4 7.3% 86.5

    6 5.4% 244.58 4.4% 827

    10 4.1% 3520

    12 4.0% 13813.5

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 99

    Multipass Search [52, 3, 68]

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    100/189

    Multipass Search [52, 3, 68]

    Use a succession of binary compositions instead of a single n -waycomposition combinable with other methods

    Prune : Use two-pass variant of composition to remove states not inany path close enough to the best

    Pruned intermediate lattices are smaller, lower number of statepairings considered

    Approximate : use simpler models (context-independent phonemodels, low-order language models)

    Rescore : : :

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 100

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    101/189

    PART III

    Finite State Methods in Language

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    102/189

    Finite State Methods in LanguageProcessing

    Richard Sproat

    Speech Synthesis Research Department

    Bell Laboratories, Lucent Technologies

    [email protected]

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 102

    Overview

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    103/189

    Text-analysis for Text-to-Speech (TTS) Synthesis

    A rich domain with lots of linguistic problems

    Probably the least familiar application of NLP technologies

    Syntactic analysis

    Some thoughts on text indexation

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 103

    The Nature of the TTS Problem

    This is some text:

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    104/189

    This is some text:It was a dark andstormy night. Fourscore and sevenyears ago. Now isthe time for allgood men. Letthem eat cake.Quoth the ravennevermore.

    Linguistic Analysis

    Speech Synthesis

    phonemes, durationsand pitch contours

    speech waveforms

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 104

    From Text to Linguistic Representation

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    105/189

    Y o C

    The rat is eating the oil &

    l a u

    us

    j o u

    rt s

    N V N

    shu3 chi1 you2lao3

    L H H L HL

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 105

    Russian Percentages: The ProblemHow do you say % in Russian?

    Adjectival forms when modifying nouns

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    106/189

    dject va o s w e od y g ou s20% s k i d k a ) d v a d c a t i - p r o c e n t n a s k i d k a

    20% discount dvadcat i - procent naja skidka

    s

    20% r a s t v o r o m ) s d v a d c a t i

    - p r o c e n t n y m r a s t v o r o m

    with 20% solution s dvadcat i - procent nym rastvorom

    Nominal forms otherwise21% ) d v a d c a t ~ o d i n p r o c e n t

    dvadcat odin procent

    23% ) d v a d c a t ~ t r i p r o c e n t a

    dvadcat tri procent a

    20% ) d v a d c a t ~ p r o c e n t o v

    dvadcat procent ov

    s 20% ) s d v a d c a t ~ p r o c e n t a m i

    with 20% s dvadcat ju procent ami

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 106

    Text Analysis Problems

    Segment text into words.

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    107/189

    Segment text into sentences, checking for and expandingabbreviations :

    St. Louis is in Missouri.

    Expand numbers

    Lexical and morphological analysis

    Word pronunciation

    Homograph disambiguation

    Phrasing

    Accentuation

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 107

    Desiderata for a Model of Text Analysis for TTS

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    108/189

    Delay decisions until have enough information to make them

    Possibly weight various alternatives

    Weighted Finite-State Transducers offer an attractive computational model

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 108

    Overall Architectural Matters

    Example: word pronunciation in Russian

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    109/189

    Text form: k o s t r a < kostra > (bonre+genitive.singular)

    Morphological analysis: k o s t

    0 E r f noun g f masc g f inan g + 0 a f sg g f gen g

    Pronunciation: /k str 0 a/

    Minimal Morphologically-Motivated Annotation (MMA): k o s t r 0 a

    (Sproat, 1996)

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 109

    Overall Architectural Matters

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    110/189

    Pronunciation

    Language Model

    Surface Orthographic FormKOSTPA #kastr"a#

    #KOSTP"A#

    :

    :

    fst:

    :

    fst

    :

    :

    fst

    :

    :

    fst

    Morphological Analysis#KOST"{E}P{noun}{masc}{inan}+"A{sg}{gen}#

    :

    MMA

    S

    M

    P

    D

    S O11

    Lexical Analysis WFST:

    L

    O

    Phonological Analysis WFST:PLL = D MO

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 110

    Orthography ! Lexical Representation

    A Closer Look

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    111/189

    Words : Lex. Annot. Lex. Annot. : Lex. Anal. _ Punc. :Interp.

    S S

    Special Symbols : Expansions SPACE :Interp.S

    Numerals : Expansions

    SPACE : white space in German, Spanish, Russian : : :

    in Japanese, Chinese : : :

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 111

    Chinese Word Segmentation

    F !

    F

    1 asp4 :

    68 le0 p e r f

    F !

    F 2 1 vb8 : 11 liao3jie3 understand

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    112/189

    j !

    j 1 vb5 : 56 da4 big

    j ! j 1 nc11 : 45 da4jie1 avenue

    !

    2 adv4 : 58 bu4 not b

    ! b vb4 : 45 zai4 at

    !

    vb11 : 77 wang4 forget

    F !

    vb++ 2 F 2 npot12 : 23 wang4+bu4liao3 unable to forget

    ! np4 : 88 wo3 I ! vb8 : 05 fang4 place

    j !

    j 1 vb10 : 70 fang4da4 enlarge

    !

    1 nc11 : 02 na3li3 where

    !

    nc10 : 35 jie1 avenue ! 1 nc10 : 92 jie3fang4 liberation

    j ! 3 j 1 urnp42 : 23 xie4 fang4da4 n a m e

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 112

    Chinese Word Segmentation

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    113/189

    Space = : #

    L = S p a c e _ ( D i c t i o n a r y _ ( S p a c e P u n c ) ) +

    BestPath( F j b L) = pro4 : 88 # vb + F 2 npot12 : 23# 1 nc10 : 92 j 1 nc11 : 45 : : :

    I couldnt forget where Liberation Avenue is.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 113

    Numeral Expansion

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    114/189

    234 Factorization ) 2 102 + 3 101 + 4

    DecadeFlop ) 2 102 + 4 + 3 101

    NumberLexicon

    +

    zwei+hundert+vier+und+dreiig

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 114

    Numeral Expansion

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    115/189

    0 1

    1:1

    2:2

    3:3

    4:45:56:67:7

    8:8

    9:9

    4

    :101

    2

    :102 5

    0:0

    1:1

    2:2

    3:3

    4:45:56:6

    7:7

    8:8

    9:93

    0:0

    1:1

    2:2

    3:3

    4:45:56:6

    7:7

    8:8

    9:9

    :101

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 115

    German Numeral Lexicon / f 1 g : (eins f num g ( f masc g j f neut g ) f sg g f ## g )/

    / f 2 g : (zwei f num g f ## g )/

    / 3 (d i ## )/

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    116/189

    / f 3 g : (drei f num g f ## g )/ ...

    /( f 0 g f +++ g f 1 g f 10 ^ 1 g ) : (zehn f num g f ## g )/ /( f 1 g f +++ g f 1 g f 10 ^ 1 g ) : (elf f num g f ## g )/

    /( f 2 g f +++ g f 1 g f 10 ^ 1 g ) : (zw olf f num g f ## g )/

    /( f 3 g f +++ g f 1 g f 10 ^ 1 g ) : (drei f ++ g zehn f num g f ## g )/

    ...

    /( f 2 g f 10 ^ 1 g ) : (zwan f ++ g zig f num g f ## g )/

    /( f 3 g f 10 ^ 1 g ) : (drei f ++ g ig f num g f ## g )/ ...

    /( f 10 ^ 2 g ) : (hundert f num g f ## g )/

    /( f 10 ^ 3 g ) : (tausend f num g f neut g f ## g )/

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 116

    Morphology: Paradigmatic Specications

    Paradigm A1

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    117/189

    Paradigm f A1 g

    # starke Flektion (z.B. nach unbestimmtem Artikel)

    Sufx f ++ g er f sg g f masc g f nom g

    Sufx f ++ g en f sg g f masc g ( f gen g j f dat g j f acc g )

    Sufx f ++ g e f sg g f femi g ( f nom g j f acc g )

    Sufx f ++ g en f sg g ( f femi g j f neut g )( f gen g j f dat g )Sufx f ++ g es f sg g f neut g ( f nom g j f acc g )

    Sufx f ++ g e f pl g ( f nom g j f acc g )

    Sufx f ++ g er f pl g f gen g

    Sufx f ++ g en f pl g f dat g

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 117

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    118/189

    Morphology: Paradigmatic Specications

    / f

    A1 g

    : (aal f

    ++ g

    glatt f

    adj g

    )/ / f A1 g : (ab f ++ g ander f ++ g lich f adj g f umlt g )/

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    119/189

    / f A1 g : (ab f ++ g artig f adj g )/

    / f

    A1 g

    : (ab f

    ++ g

    bau f

    ++ g

    wurdig f

    adj g f

    umlt g

    )/ ...

    / f A6 g : (dein f adj g )/

    / f

    A6 g

    : (euer f

    adj g

    )/ / f A6 g : (ihr f adj g )/

    / f A6 g : (Ihr f adj g )/

    / f

    A6 g

    : (mein f

    adj g

    )/ / f A6 g : (sein f adj g )/

    / f A6 g : (unser f adj g )/

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 119

    Morphology: Paradigmatic Specications

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    120/189

    Project(( f A6 g _ Endings) (( f A6 g :Stems) _ Id( ))) )

    0 1m 2 3e 4i 5n 6adj 7++

    8sg

    12

    e

    9masc

    11

    neut

    pl

    13sg

    14

    m

    17

    n 20

    r 24s

    10

    nom

    nomaccfemi

    15sg 16

    pl18

    sg

    21sg

    23pl

    25sg

    masc

    neut

    dat19masc

    acc22femi

    gen

    gen

    dat

    mascneut

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 120

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    121/189

    Morphology: Finite-State Grammar

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    122/189

    FUGE SECOND f ++ g < 1.5 >

    FUGE SECOND f

    ++ g

    s f

    ++ g

    ...

    SECOND PREFIX f Eps g < 1.0 >

    SECOND STEM f Eps g < 2.0 >

    SECOND WORD f Eps g < 2.0 >...

    WORD

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 122

    Morphology: Finite-State Grammar

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    123/189

    Unanst andigkeitsunterstellungallegation of indecency

    +

    "un f ++ g "an f ++ g stand f ++ g ig f ++ g keit f ++ g s f ++ g unter f ++ g stell f ++ g ung

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 123

    Rewrite Rule Compilation

    Context-dependent rewrite rules

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    124/189

    General form : ! =

    ; ; ; regular expressions.

    Constraint: cannot be rewritten but can be used as a context

    Example : a ! b = c b

    (Johnson, 1972; Kaplan & Kay, 1994; Karttunen, 1995; Mohri & Sproat,1996)

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 124

    Example

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    125/189

    a ! b = c b

    w = c a b

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 125

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    126/189

    Example

    After replace :

    2

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    127/189

    0 1c2

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    128/189

    Based on the use of marking transducers

    Brackets inserted only where needed

    Efciency

    3 determinizations + additional linear time work

    Smaller number of compositions

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 128

    Rule Compilation Method

    r f r e p l a c e l 1 l 2

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    129/189

    r : ! >

    f : ( f > g ) > ! ( f > g ) f < 1 ; < 2 g >

    r e p l a c e : < 1 > ! < 1

    l 1 :

    < 1 !

    l 2 : < 2 !

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 129

    Marking Transducers

    Proposition Let be a deterministic automaton representing then

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    130/189

    Proposition Let be a deterministic automaton representing , thenthe transducer post-marks occurrences of by #.

    q

    c:c

    d:d

    a:a

    b:b

    Final state q with entering and leaving transitions of I d ( ) .

    q q:#

    c:c

    d:d

    a:a

    b:b

    States and transitions after modications, transducer .

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 130

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    131/189

    The Transducers as Expressions using Marker

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    132/189

    r = r e v e r s e ( M a r k e r ( r e v e r s e ( ) ; 1 ; f > g ; ; ) ) ]

    f = r e v e r s e ( M a r k e r ( ( f > g ) r e v e r s e ( >

    > ) ; 1 ; f < 1 ; < 2 g ; ; ) ) ]

    l 1 = M a r k e r ( ; 2 ; ; ; f < 1 g ) ] < 2 : < 2 l 2 = M a r k e r ( ; 3 ; ; ; f < 2 g ) ]

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 132

    Example: r for rule a ! b = c b

    a:ac:c

    b bb:b

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    133/189

    r e v e r s e ( ) =

    0 1

    b:b

    a:a

    c:c

    M a r k e r ( r e v e r s e ( ) ; 1 ; f > g ; ; ) =

    0

    a:ac:c

    1b:b

    Eps:>

    r e v e r s e ( M a r k e r ( r e v e r s e ( ) ; 1 ; f > g ; ; ) ) =

    0

    a:ac:c

    1Eps:>

    b:b

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 133

    The Replace Transducer

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    134/189

    0

    :, < :< , >:2 2

    1< ::1 2

    >:

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 134

    Extension to Weighted Rules

    Weighted context-dependent rules:

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    135/189

    ! =

    ; ; regular expressions,

    formal power series on the tropical semiring

    Example: c ! ( : 9 c ) + ( : 1 t ) = a t

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 135

    Rational power series

    Functions S : ! R +

    f 1 g , Rational power series

    Tropical semiring : min

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    136/189

    Tropical semiring : ( R +

    f 1 g ; min ; + )

    Notation: S =

    X

    w 2 ( S ; w )

    Example: S = ( 2 a ) ( 3 b ) ( 4 b ) ( 5 b ) + ( 5 a ) ( 3 b )

    ( S ; a b b b ) = min f 2 + 3 + 4 + 5 = 14 ; 5 + 3 + 3 + 3 = 11 g = 11

    Theorem 6 (Sch utzenberger, 1961): S is rational iff it is recognizable(representable by a weighted transducer).

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 136

    Compilation of weighted rules

    Extension of the composition algorithm to the weighted case

    Efcient lter for -transitions

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    137/189

    Efcient lter for transitions

    Addition of weights of matching labels

    Same compilation algorithm

    Single-source shortest paths algorithms to nd the best path

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 137

    Rewrite Rules: An Example

    s ! z = ($|#) VStop ;

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    138/189

    ($| ) p ;

    0

    V:V$:$z:z

    VStop:VStop#:#

    1

    s:s

    2

    s:z

    z:z

    V:V

    s:s

    s:z

    3#:#

    $:$

    VStop:VStop

    4$:$

    #:#

    V:V

    $:$

    z:z

    #:#

    s:ss:z

    VStop:VStop

    /mis$mo$/ Voicing = /miz$mo$/

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 138

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    139/189

    Russian Percentage Expansion: An example

    s 5% s k i d k o i

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    140/189

    Lexical Analysis FST

    +

    sprep pjatnum nom -procentn adj ? +aja fem + sg + nom skidk fem ojsg + instr

    sprep pjatnum igen-procentn adj ? +oj fem + sg + instr skidk fem ojsg + instr 2 : 0

    sprep pjatnum juinstr -procent noun +ami pl + instr skidk femojsg + instr 4 : 0

    ...

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 140

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    141/189

    Percentage Expansion: Continued

    s 5% s k i d k o i

    +

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    142/189

    s pjati gen-procentn adjojsg + instr skidkoj

    L P

    +

    s # PiT"!p r@c"Entn&y # sK"!tk&y

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 142

    Phrasing Prediction

    Problem : predict intonational phrase boundaries in longunpunctuated utterences:

    h l ld l k k d d

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    143/189

    For his part, Clinton told reporters in Little Rock, Ark., on Wednesday k that the pact can be a good thing for America k if we change our economic policy k to rebuild American industry here at home k and if we get the kind of guarantees we need on environmental and labor standards in Mexico k and a real plan k to help the people who willbe dislocated by it.

    Bell Labs synthesizer uses a CART-based predictor trained on labeledcorpora (Wang & Hirschberg 1992).

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 143

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    144/189

    Phrasing Prediction: Sample Tree

    punc:NO

    punc:YES

    69923/82958no

    j3f:CC,CS,EX,FORIN,IN,ININ,ONIN,TO,TOIN

    j3f:AT,CD,PART,UH,NA

    69826/74358no

    no

    no

    j3n:NN,NP

    j3n:NNS,PN,NA

    8503/8600

    yes

    yes yes

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    145/189

    j3f:CC,CS

    j3f:EX,FORIN,IN,ININ,ONIN,TO,TOIN

    9968/13032

    syls:7.5

    1502/2985

    no

    j1v:HV,MD,SAIDVBD,VB,VBD,VBG

    j1v:HVD,VBN,VBZ,NA

    361/452no

    3221/29

    yes

    33353/423

    no

    j4n:NN,NNS,NP

    j4n:PN,NA

    1392/2533

    yes

    npdist:1.875

    510/792no

    68472/667

    no

    6987/125

    yes

    j2n:NN,NNS,NP

    j2n:PN,NA

    1110/1741

    yes

    70820/1042

    yes

    j3f:CC

    j3f:CS

    409/699no

    j1f:AT,CS,IN,TO

    j1f:CC,CD,FORIN,ININ,ONIN,TOIN,NA

    167/295

    yes

    28440/60

    no

    285147/235

    yes

    raj4:CL,DEACC

    raj4:ACC

    281/404no

    286275/379

    no

    28719/25

    yes

    98466/10047

    no

    nploc:SUCC,SINGLE

    nploc:PRE,W/IN,OTHER

    59858/61326

    j2n:NN,NNS,NP

    j2n:PN,NA

    8536/9549

    no

    j3w:WP$,WDT,WPS,WRB

    j3w:NA

    6111/7106no

    ssylsp:4.5

    219/418

    yes

    8048/53

    no

    j1f:AT,CS,FORIN,TOIN

    j1f:CC,CD,IN,ININ,NA

    214/365

    yes

    16256/88

    no

    163182/277

    yes

    415912/6688

    no

    212425/2443

    no

    1151322/51777

    no

    j4f:0,ONIN

    j4f:AT,CC,CD,FORIN,IN,ININ,TO,TOIN,NA

    777/848

    y

    1230/39

    no

    13768/809

    yes

    77726/7752

    y

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 145

    Phrasing Prediction: Results

    Results for multi-speaker read speech:

    major boundaries only: 91.2%

    collapsed major/minor phrases: 88.4%

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    146/189

    3-way distinction between major, minor and null boundary:81.9%

    Results for spontaneous speech:

    major boundaries only: 88.2%

    collapsed major/minor phrases: 84.4%

    3-way distinction between major, minor and null boundary:78.9%

    Results for 85K words of hand-annotated text, cross-validated ontraining data: 95.4% .

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 146

    Tree-Based Modeling: Prosodic Phrase Prediction

    [1] dpunc:3.5

    920/1800no

    [2] lpos:N,V,A,Adv,Dlpos:P

    620/1080no

    rpos:N,Arpos:V,Adv,D,P

    420/720yes

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    147/189

    [3] rpos:N,Arpos:V,Adv,D,P

    495/900no

    [4] dpunc:2.5

    190/300no

    16133/200

    no

    lpos:Nlpos:V,A,Adv,D

    57/100

    no

    3413/20

    yes

    3550/80

    no

    dpunc:1.5

    305/600no

    lpos:V,A,Advlpos:N,D

    117/200

    no

    3678/120

    no

    lpos:Nlpos:D

    41/80yes

    7423/40yes

    7522/40

    no

    rpos:V,Adv,Drpos:P

    212/400

    yes

    lpos:V,Alpos:N,Adv,D

    152/300yes

    rpos:Vrpos:Adv,D

    64/120no

    15224/40yes

    15348/80

    no

    rpos:Vrpos:Adv,D

    96/180yes

    15433/60

    no

    15569/120

    yes

    3960/100

    yes

    5125/180

    no

    lpos:Alpos:N,V,Adv,D,P

    134/240no

    1230/40

    no

    lpos:V,Advlpos:N,D,P

    104/200no

    2647/80

    no

    rpos:Nrpos:A

    63/120

    yes

    5435/60

    no

    5538/60yes

    7314/480

    yes

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 147

    The Tree Compilation Algorithm

    (Sproat & Riley, 1996)

    Each leaf node corresponds to single rule dening a constrained weighted mapping for the input symbol associated with the tree

    Decisions at each node are stateable as regular expressions restricting the left

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    148/189

    Decisions at each node are stateable as regular expressions restricting the left

    or right context of the rule(s) dominated by the branch The full left/right context of the rule at a leaf node are derived by intersecting

    the expressions traversed between the root and leaf node

    The transducer for the entire tree represents the conjunction of all theconstraints expressed at the leaf nodes; it is derived by intersecting togetherthe set of WFSTs corresponding to each of the leaves

    Note that intersection is dened for transducers that express same-lengthrelations

    The alphabet is dened to be an alphabet of all correspondence pairs thatwere determined empirically to be possible

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 148

    Interpretation of Tree as a Ruleset

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    149/189

    Node 16

    1 ( ( I ! I ! # ! I ! # ! # ! ) ) \ 3 N A

    2 ( ( N V A A d v D ) ) \

    4 ( ( I ! I ! # ! ) )

    # ) ( I 1 : 09 #0 : 41 ) = I ( ! # ) ? ( N V A A d v D ) N A

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 149

    Summary of Compilation Algorithm

    Each rule represents a weighted two-level surface coercion rule

    R u l e

    L

    = C o m p i l e (

    T

    !

    L

    =

    \

    p 2 P

    L

    p

    \

    p 2 P

    L

    p

    )

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    150/189

    Each tree/forest represents a set of simultaneous weighted two-levelsurface coercion rules

    R u l e

    T

    =

    \

    L 2 T

    R u l e

    L

    R u l e

    F

    =

    \

    T 2 F

    R u l e

    T

    BestPath(,D#N#V#Adv#D#A#N Tree) ) ,D#N#V#Adv , D#A#N 2 : 76

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 150

    Lexical Ambiguity Resolution

    Word sense disambiguation :

    She handed down a harsh sentence . peine

    This sentence is ungrammatical. phrase

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    151/189

    Homograph disambiguation :He plays bass . /be / s/

    This lake contains a lot of bass . /bs/

    Diacritic restoration :appeler lautre cote de latlantique c ote side

    Cote dAzur c ote coast

    (Yarowsky, 1992; Yarowsky 1996; Sproat, Hirschberg & Yarowsky, 1992;Hearst 1991)

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 151

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    152/189

    Homograph Disambiguation 2

    Sort by A b s ( L o g ( P r ( P r o n 1 j C o l l o c a t i o n i ) P r ( P r o n 2 j C o l l o c a t i o n i )

    ) )

    Decision List for lead

    Logprob Evidence Pronunciation

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    153/189

    11.40 follow/V + lead ) lid

    11.20 zinc $ lead ) l d

    11.10 lead level/N ) l d

    10.66 of lead in ) l d

    10.59 the lead in ) lid

    10.51 lead role ) lid

    10.35 copper $ lead ) l d

    10.28 lead time ) lid

    10.16 lead poisoning ) l d

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 153

    Homograph Disambiguation 3: Pruning

    Redundancy by subsumption

    Evidence lid l d Logprob

    lead level/N 219 0 11.10

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    154/189

    lead levels 167 0 10.66

    lead level 52 0 8.93

    Redundancy by association

    Evidence t

    t i

    tear gas 0 1671

    tear $ police 0 286

    tear $

    riot 0 78tear $ protesters 0 71

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 154

    Homograph Disambiguation 4: Use

    Choose single best piece of matching evidence.

    Decision List for lead

    Logprob Evidence Pronunciation

    11 40 f ll /V lead lid

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    155/189

    11.40 follow/V + lead ) lid

    11.20 zinc $ lead ) l d

    11.10 lead level/N ) l d

    10.66 of lead in ) l d

    10.59 the lead in ) lid

    10.51 lead role ) lid

    10.35 copper $ lead ) l d

    10.28 lead time ) lid

    10.16 lead poisoning ) l d

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 155

    Homograph Disambiguation: EvaluationWord Pron1 Pron2 Sample Size Prior Performance

    lives la i vz l i vz 33186 .69 .98

    wound wa Y nd wund 4483 .55 .98

    Nice na i s nis 573 .56 .94

    Begin b g n be g n 1143 75 97

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    156/189

    Begin b i q g i n be i g i n 1143 .75 .97

    Chi t S i ka i 1288 .53 .98

    Colon ko Y q lo Y n q ko Y l n 1984 .69 .98

    lead (N) lid l d 12165 .66 .98

    tear (N) t t i 2271 .88 .97

    axes (N) q ksiz q ks i z 1344 .72 .96

    IV a i vi f A M W 1442 .76 .98

    Jan d c n j n 1327 .90 .98

    routed M ut i d M a Y t i d 589 .60 .94

    bass be i s bs 1865 .57 .99

    TOTAL 63660 .67 .97

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 156

    Decision Lists: Summary

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    157/189

    Efcient and exible use of data.

    Easy to interpret and modify.

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 157

    Decision Lists as WFSTs

    The lead example

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    158/189

    Construct homograph taggers H

    0, H

    1 : : :

    that nd and tag instancesof a homograph set in a lexical analysis. For example, H 1 is:

    0

    :

    1

    ##:##

    :2

    l:l

    :3

    e:e:

    4

    a:a:

    5

    d:d:

    6

    1:1

    : 7nn:nn8

    :H1:

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 158

    Decision Lists as WFSTs

    Construct an environmental classier consisting of a pair of transducers C 1 and C 2 ,where

    C 1 optionally rewrites any symbol except the word boundary or the homograph tagsH0, H1 : : : , as a single dummy symbol

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    159/189

    C

    2 classies contextual evidence from the decision list according to its type, andassigns a cost equal to the position of the evidence in the list; and otherwise passes, word boundary and H0, H1 : : : through:

    ## follow vb ## ! ## V0 ## < 1 >

    ## zinc nn ## ! ## C1 ## < 2 >

    ## level(s?) nn ## ! ## R1 ## < 3 >

    ## of pp ## ! ## [1 ## < 2 >

    ## in pp ## !

    ## 1] ##

    ...

    M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 159

    Decision Lists as WFSTs

    Construct a disambiguator D from a set of optional rules of the form:H0 ! 3 / V0

    H1 ! 3 / C1

    H1 ! 3 / C1

  • 8/8/2019 Algorithms for Speech Recognition and Language Processing

    160/189

    H0 ! 3 / ## R0

    H1 ! 3 / ## R1

    H0 ! 3 / [0 #