d ynamic time warping and minimum distance paths for speech recognition

16
1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser e.g. voice dialling on mobile phones Method: 1. Record, parameterise and store vocabulary of reference words 2. Record test word to be recognised and parameterise 3. Measure distance between test word and each reference word 4. Choose reference word ‘closest’ to test word

Upload: kale

Post on 04-Jan-2016

22 views

Category:

Documents


2 download

DESCRIPTION

D ynamic Time Warping and Minimum Distance Paths for Speech Recognition. Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser e.g. voice dialling on mobile phones Method: Record, parameterise and store vocabulary of reference words - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

1

Dynamic Time Warping and Minimum Distance Paths for Speech Recognition

Isolated word recognition:

• Task :

• Want to build an isolated ‘word’ recogniser e.g. voice dialling on

mobile phones

• Method:

1. Record, parameterise and store vocabulary of reference words

2. Record test word to be recognised and parameterise

3. Measure distance between test word and each reference word

4. Choose reference word ‘closest’ to test word

Page 2: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

2

Words are parameterised on a frame-by-frame basis

Choose frame length, over which speech remains reasonably stationary

Overlap frames e.g. 40ms frames, 10ms frame shift

We want to compare frames of test and reference words i.e. calculate distances between them

40ms

20ms

Page 3: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

3

• Problem:

Number of frames won’t always correspond

• Easy:

Sum differences between corresponding frames

Calculating Distances

Page 4: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

4

• Solution 1: Linear Time Warping

Stretch shorter sound

• Problem?

Some sounds stretch more than others

Page 5: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

5

• Solution 2:

Dynamic Time Warping (DTW)

5 3 9 7 3

4 7 4

Test

Reference

Using a dynamic alignment, make most similar frames correspond

Find distances between two utterences using these corresponding frames

Page 6: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

6

Digression: Dynamic Programming

• The shortest route from Dublin to Limerick goes through:– Kildare– Monasterevin– Portlaoise– Mountrath– Roscrea– Nenagh

• Now consider the shortest route from Dublin to Nenagh– What towns does the route go through?

Page 7: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

7

Intercity Example

Page 8: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

8

Page 9: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

9

3 5 1 x 4 x 1 x

7 4 3 x 0 x 3 x

9 3 5 x 2 x 5 x

3 2 1 x 4 x 1 x

5 1 1 x 2 x 1 x

1 2 3

4 7 4

Reference

Test

We can also find the path through the grid that minimizes total cost of path

3 5 11 x 8 x 5 x

7 4 10 x 4 x 7 x

9 3 7 x 4 x 9 x

3 2 2 x 5 x 4 x

5 1 1 x 3 x 4 x

1 2 3

4 7 4

Compute minimum distances dist each point and place in mindist matrix:

mindist(5,3) = min{1 + mindist(5,2),

1 + mindist(4,2),

1 + mindist(4,3)}

Test

Reference

Place distance between frame r of Test and frame c of Reference in cell(r,c) of distance matrix

Page 10: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

10

Examples so far are uni-dimensional

Speech is multi-dimensional

e.g. two dimensions, using points (4,3) and (5,2)

4 5

1 2 3 4 5

54321

x

x

)²()²( 1212 yyxx

)²x-(x )²x-(x )²x-(x rntnr2t2r1t1

Distance equation for 2 dimensions:

Distance equation for multi-dimensional:

Page 11: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

11

Constraints• Global

– Endpoint detection– Path should be close to diagonal

• Local– Must always travel upwards or eastwards– No jumps– Slope weighting– Consecutive moves upwards/eastwards

Page 12: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

12

Global Constraints

Page 13: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

13

Local Constraints

mindist(r,c)

mindist(r,c-1)

mindist(r-1,c)mindist(r-1,c-1)

1

12

weights

Page 14: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

14

Points to Note• DTW really only suitable for small vocabularies

and/or speaker dependent recognition• Should normalise for reference length• Can use multiple utterances and cluster them• Poor performance if recording environment changes• High computation cost

Page 15: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

15

Evaluation• Performance of designs only comparable by

evaluation• Use a test set• For single word recognition we can simply quote %

accuracy:

%100s test wordof No.

correct wordsof No.Accuracy

In error analysis, it can be helpful to use a confusion matrix

Page 16: D ynamic Time Warping  and Minimum Distance Paths  for Speech Recognition

16

Confusion Matrix

references

test tokens

yes no

yes 24 2

no 3 21