dtw for qbsh j.-s roger jang ( 張智星 ) mir labmir lab, csie dept. national taiwan university
TRANSCRIPT
DTW for QBSH
J.-S Roger Jang (張智星 )
http://mirlab.org/jang
MIR Lab, CSIE Dept.
National Taiwan University
-2-
Dynamic Time Warping (DTW)
Goal: Allows comparison of high tolerance to tempo variation
Characteristics: Robust for irregular tempo variations Trial-and-error for dealing with key transposition
Expensive in computation Does not conform to triangle inequality Some indexing algorithms do exist
-3-
Type-1 DTW
i
j
t(i-1)
r(j)
),(minAnswer 3.
|)1()1(|)1,1(
)1,2(
)1,1(
)2,1(
min|)()(|),(
),(for formula Recurrent.2
):1( and ):1( between distanceDTW :),( .1
jmD
rtD
jiD
jiD
jiD
jritjiD
jiD
jritjiD
j
),( jiD
t: input pitch vector (8 sec)r: reference pitch vectorLocal paths: 27-45-63 degrees
3-step formula for type-1 DTW(with anchored beginning)
r(j-1)
t(i)
-4-
Type-2 DTW
i
j
t(i-1)
r(j) ),( jiD
r(j-1)
t(i)
t: input pitch vector (8 sec)r: reference pitch vectorLocal paths: 0-45-90 degrees
3-step formula for type-2 DTW(with anchored beginning)
),(minAnswer 3.
|)1()1(|)1,1(
),1(
)1,1(
)1,(
min|)()(|),(
),(for formula Recurrent.2
):1( and ):1( between distanceDTW :),( .1
jmD
rtD
jiD
jiD
jiD
jritjiD
jiD
jritjiD
j
-5-
Local Path Constraints
Type 1: 27-45-63 local paths
Type 2: 0-45-90 local paths
jiD ,
jiD ,
),1(
)1,1(
)1,(
min
)()(),(
jiD
jiD
jiD
jritjiD
)1,2(
)1,1(
)2,1(
min
)()(),(
jiD
jiD
jiD
jritjiD
2,1 jiD
1, jiD 1,1 jiD
jiD ,1
1,1 jiD 1,2 jiD
-6-
Path Penalty
Goal: To avoid paths deviated from 45 degrees
Path penalty Small/no penalty for 45-degree path Large penalty for paths deviated from 45-degree
)1,2(
)1,1(
)2,1(
min)()(),(
jiD
jiD
jiD
jritjiD
),( jiD
)2,1( jiD
)1,2( jiD
)1,1( jiD
0
-8-
DTW Paths of “Anchored Beginning”
Anchored beginning end position is free to move
Assumption: The speed of a user’s acoustic input falls within 1/2 and 2 times of that of the intended song.
DTW table size for 8-sec query = 250x180 250 = 31.25*8 375 = 250*1.5
i
j
-9-
DTW Paths of “Anchored Anywhere”
Anchored anywhere Both ends are free to move.
DTW table size for 8-sec query against 3-min song = 250 x 5620 250 = 31.25*8 5620 = 31.25*180
i
j
-10-
4
2
8
8
2
3
4
1
2 3 7 8 2
7
2
5
0
1
6
6
0
2
1
1 3 4 2
4 0 1 5
1 5 6 0
5 1
1
5
4
3
6
5
1
0
1
2 7
4
5
6
0
0 6
6
0
1
2
1
-11-
4
2
8
8
2
3
4
1
2 3 7 8 2
7
2
5
0
1
6
6
0
2
1
2
1
2
5
5
7
0
10
3
1
6
6
4
7
7 6
5
12
7
1
6
2
4
7
1
2
1 3 4 2
4 0 1 5
01 5 6 0
1
2
1
0
6
65 1
1
5
4
3
6
5
1
0
1
2 7
4
5
6
0
0 6
6
0
1
2
1
-13-
Implementation Issues
To save memory Use 2-column table for type-1 DTW Use 1-column table for type-2 DTW
To avoid too many if-then statements Pad type-1 DTW with two-layer padding Pad type-2 DTW with one-layer padding
To find a suitable path Minimizing total distance Minimizing average distance
-14-
Other Variants
Local constraints
Flexible start/ending pos.
-15-
DTW Path of “Anchored Beginning”
-16-
DTW Path of “Anchored Anywhere”
-17-
Another Two Views of DTW Path of “Anchored
Anywhere”
-19-
Key Transposition (1/2)
Goal: Allow users’ input of different keys
Method 1: Mean shift and heuristic modification
5 DTW computation when compared to each song
Mean
-4 40-2 21 3
t-2t+2(t’)t’-1 t’+1t
-20-
Key Transposition (2/2)
Method 2: Fixed point iteration Step 1: DTW alignment Step 2: Stop if mapping path fixed Step 3: Shift to the same mean based on the alignment
Step 4: Go back to step 2.Characteristics
DTW distance monotonically non-increasing to guarantee convergence
-24-
Type-3 DTW:Frame to Note Alignment
DP-based method for filling the table:
67
64
65
Frame-levelPitch vector
Notes
)1,1(
),1(min|)()(|),(
jiD
jiDjritjiD
jiD ,
1,1 jiD
jiD ,1
Recurrent formula: Local constraint:
62
65
-25-
Type-3 DTW
Characteristics Frame-based query input vs. note-based music database
Note duration unused
More efficient, less effective
Heuristics for key-transposition
Mapping path
-26-
Type-3 DTW:Effects of Key Transposition
Rough key transpos.
Fine key transpos.
Please refer to the online tutorial page for playback.