improving melodic similarity in indian art music using...

1
Results Conclusions Duration truncation of steady melodic regions signicantly improves melodic similarity in IAM Complexity weighting considering inection points improves melodic similarity in Carnatic music Tetra-chord normalization improves retrieval accuracy in Carnatic music. References X. Serra, “A multicultural approach to music information research,” in Proc. of 12th International Society for Music Information Retrieval Conference, pp. 151–156, 2011. S. Gulati, J. Serrà and X. Serra, "An Evaluation of Methodologies for Melodic Similarity in Audio Recordings of Indian Art Music", in Proc. of ICASSP, pp. 678–682, 2015. G. E. Batista, X. Wang, and E. J Keogh. "A complexity- invariant distance measure for time series", in SDM, volume 11, pp. 699–710, 2011 Methodology Improving Melodic Similarity In Indian Art Music Using Culture Specic Melodic Characteristics Sankalp Gulati * , Joan Serrà ¥ and Xavier Serra * [email protected], [email protected] and [email protected] * Music Technology Group, UPF, ¥ Telefonica Research, Barcelona, Spain Goals & Challenges Improved computational model for melodic similarity in IAM A top-down approach utilizing culture-specic characteristics Variability in the overall duration Large non-linear timing variations Added Melodic ornamentations Audio Predominant melody estimation Melody abstraction and normalization Distance computation Music collection Over 5 hours of polyphonic recordings, 23 Carnatic (CMD), 9 Hindustani (HMD) 625 phrases instances,10 Phrase categories, 6 r āgas, 21 artists and dierent forms Annotations: two performing musicians with over 15 years of music training Indian art Music Hindustani (North-Indian), Carnatic (South-Indian) music. Rāga melody framework, T āla rhythm framework Rāga: Svaras, Aroh-Avroh, Characteristic phrases Oral pedagogy, essentially audio music repertoire Practically no written music (descriptive) scores Mean Average Precision N Annotated Phrases 100N Noise Candidates Queries Top 1000 nearest neighbors 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Time (s) 500 0 500 1000 1500 Frequency (Cent) P 1 P 2 P 3 C 1 C 2 C 3 C 4 C 5 H 1 H 2 H 3 H 4 H 5 0.1 0.3 0.5 0.75 1.0 1.5 2.0 δ (s) 0.44 0.46 0.48 0.50 0.52 MAP HMD CMD Proposed approach Partial transcription -> duration truncation (M DT ) Complexity Weighting (M CW ): D nal = D DTW Tetrachord normalization (N tetra ) Time (s) Frequency(Cents) Experimental setup 1 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 200 HMD Norm M B M DT M CW1 M CW2 N tonic 0.45 (0.25) 0.52 (0.24) - - N mean 0.25 (0.20) 0.31 (0.23) - - N tetra 0.40 (0.23) 0.47 (0.23) - - CMD Norm M B M DT M CW1 M CW2 N tonic 0.39 (0.29) 0.42 (0.29) 0.41 (0.28) 0.41 (0.29) N mean 0.39 (0.26) 0.45 (0.28) 0.43 (0.27) 0.45 (0.27) N tetra 0.45 (0.26) 0.50 (0.27) 0.49 (0.28) 0.51 (0.27) α = max(C i C j ) min(C i C j ) ; C i = 2 v u u t N 1 X i=1 (p i p i+1 ) 2 α , , 16th International Society for Music Information Retrieval Conference, Málaga, Spain 2015 "The distance between pairs of complex time series is frequently greater than the distance between pairs of simple timeseries" - Batista et al. Increasing Shape Complexity d 1 d 2 d 3 d 1 <d 3 <d 2 24 32 6 7

Upload: others

Post on 28-Jan-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Melodic Similarity In Indian Art Music Using ...mtg.upf.edu/system/files/publications/2015_ISMIR_poster_sankalp.pdf · Improving Melodic Similarity In Indian Art Music Using

Results

ConclusionsDuration truncation of steady melodic regions significantly improves melodic similarity in IAMComplexity weighting considering inflection points improves melodic similarity in Carnatic musicTetra-chord normalization improves retrieval accuracy in Carnatic music.

ReferencesX. Serra, “A multicultural approach to music information research,” in Proc. of 12th International Society for Music Information Retrieval Conference, pp. 151–156, 2011.

S. Gulati, J. Serrà and X. Serra, "An Evaluation of Methodologies for Melodic Similarity in Audio Recordings of Indian Art Music", in Proc. of ICASSP, pp. 678–682, 2015.

G. E. Batista, X. Wang, and E. J Keogh. "A complexity- invariant distance measure for time series", in SDM, volume 11, pp. 699–710, 2011

Methodology

Improving Melodic Similarity In Indian Art Music Using CultureSpecific Melodic CharacteristicsSankalp Gulati*, Joan Serrॠand Xavier Serra*

[email protected], [email protected] and [email protected]

*Music Technology Group, UPF, ¥Telefonica Research, Barcelona, Spain

Goals & Challenges Improved computational model for melodic similarity in IAMA top-down approach utilizing culture-specific characteristicsVariability in the overall durationLarge non-linear timing variationsAdded Melodic ornamentations

Audio Predominant melody estimation

Melody abstraction and normalization Distance computation

Music collectionOver 5 hours of polyphonic recordings, 23 Carnatic (CMD), 9 Hindustani (HMD) 625 phrases instances,10 Phrase categories, 6 rāgas, 21 artists and different formsAnnotations: two performing musicians with over 15 years of music training

Indian art MusicHindustani (North-Indian), Carnatic (South-Indian) music.Rāga melody framework, Tāla rhythm frameworkRāga: Svaras, Aroh-Avroh, Characteristic phrasesOral pedagogy, essentially audio music repertoirePractically no written music (descriptive) scores

MeanAveragePrecision

N Annotated Phrases

100N Noise Candidates

Queries Top 1000 nearest neighbors

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6Time (s)

500

0

500

1000

1500

Fre

quen

cy (

Cen

t)

P1

P2

P3

C1 C2 C3 C4 C5H1 H2 H3 H4 H5

0.1 0.3 0.5 0.75 1.0 1.5 2.0

δ(s)

0.44

0.46

0.48

0.50

0.52

MA

P

HMD

CMD

Proposed approachPartial transcription -> duration truncation (MDT)Complexity Weighting (MCW): Dfinal = DDTW

Tetrachord normalization (Ntetra)

Time (s)

Fre

quen

cy(C

ents

)

Experimental setup

10 2 3 4 5 6 7 8 9 10 11 12 13 14 150

200

HMDNorm MB MD T MC W 1 MC W 2

Ntonic 0.45 (0.25) 0.52 (0.24) - -Nmean 0.25 (0.20) 0.31 (0.23) - -Ntetra 0.40 (0.23) 0.47 (0.23) - -

CMDNorm MB MD T MC W 1 MC W 2

Ntonic 0.39 (0.29) 0.42 (0.29) 0.41 (0.28) 0.41 (0.29)Nmean 0.39 (0.26) 0.45 (0.28) 0.43 (0.27) 0.45 (0.27)Ntetra 0.45 (0.26) 0.50 (0.27) 0.49 (0.28) 0.51 (0.27)

α =max(Ci Cj)

min(Ci C j); Ci =

2

vuutN −1X

i=1

(pi − pi+1)2

α

,

,

16th International Society for Music Information Retrieval Conference, Málaga, Spain 2015

"The distance between pairs of complex time series is frequentlygreater than the distance between pairs of simple timeseries" - Batista et al.

IncreasingShapeComplexity

d1 d2d3

d1<d3<d2

86.72%.However, rather than stopping here, we decided to testthe universal phasing assumption. Suppose we ignoredit and tested DTW for all possible alignments/shifts.After testing the phase-invariant version of DTW [13],we found that the accuracy increased to an impressive91.4%. Clearly, the current universal phasingalgorithm does not produce perfect alignments.Finally, we note the obvious, that in some casesinvariances can decrease accuracy. An obviousexample is in the classification of shapes converted totime series (as in Figure 6). If we wanted todiscriminate between the shapes 'p' and 'd', this wouldbe trivial, but adding phase (rotation) invariance wouldmake it impossible.The complexity invariance that we are proposing inthis work is not a special case of any of the above orany combination of the above, but rather a newinvariance whose need has escaped the attention of thecommunity.

3. COMPLEXITY-INVARIANTDISTANCE FOR TIME SERIESWe are finally in a position to introduce the corecontribution of this work. As we have seen, theresearch community has proposed a diverse set ofinvariances for time series in the last two decades.However, there is one invariance that has been missed,complexity invariance (for the moment, the reader'snatural intuition of complex as having many peaks andvalues will suffice.) In many (perhaps most) domains,different classes may have widely varyingcomplexities. We can most readily see this on timeseries derived from shapes (as in Figure 6). Forexample, as we show in Figure 7, leaves range incomplexity from a simple pointed ellipse (ovate),shown bottom right, to a jagged-edged familiar mapleleaf (palmate), shown bottom left.

Figure 7: Examples of objects from a singledomain, which have different shape complexities

objects, even those which subjectively may seem verysimilar to the human eye, tend to be further apart undercurrent distance measures than pairs of simple objects.This fact introduces errors in nearest neighborclassification, because complex objects are incorrectlyassigned to a simpler class.We first illustrate the necessity for a complexity-invariant distance for time series using a synthetic dataset of figures with different shape complexities. Notethat our use of synthetic data here is for clarity; as wehinted at in Figure 7 and will show later, thecomplexity problem occurs in many real data sets.Figure 8 presents some geometric figures withincreasing shape complexity. Each figure is labeledwith its number of edges for reference. As in Figure 6,the two-dimensional shapes are converted to a single-dimensional “time” series by calculating the distancebetween the central point and the figure contour.Figure 8 also presents the z-normalized time seriescreated for each figure.

Figure 8: A set of geometric figures with increasingshape complexity and the respective “time” seriesextracted by calculating the distance between thecentral point and its contour

series in Figure 8 using Euclidean distance and presentthe results in Table 1.

Table 1: Euclidean distance matrix for thegeometric figures data set

4 5 6 7 10 12 24 32

4 1.000 1.122 1.231 1.181 1.048 1.155 1.1705 1.318 1.068 1.103 1.153 1.165 1.1806 1.088 1.097 1.103 1.186 1.2007 1.217 1.199 1.198 1.19110 1.263 1.195 1.21412 1.135 1.19924 1.19132

We can use the numbers in Table 1 (and thedendrogram created from them, as shown in Figure10.left) to explore the most similar shapes according

4 5 6 7

10 12 24 32

would help, and indeed our accuracy jumped to86.72%.However, rather than stopping here, we decided to testthe universal phasing assumption. Suppose we ignoredit and tested DTW for all possible alignments/shifts.After testing the phase-invariant version of DTW [13],we found that the accuracy increased to an impressive91.4%. Clearly, the current universal phasingalgorithm does not produce perfect alignments.Finally, we note the obvious, that in some casesinvariances can decrease accuracy. An obviousexample is in the classification of shapes converted totime series (as in Figure 6). If we wanted todiscriminate between the shapes 'p' and 'd', this wouldbe trivial, but adding phase (rotation) invariance wouldmake it impossible.The complexity invariance that we are proposing inthis work is not a special case of any of the above orany combination of the above, but rather a newinvariance whose need has escaped the attention of thecommunity.

3. COMPLEXITY-INVARIANTDISTANCE FOR TIME SERIESWe are finally in a position to introduce the corecontribution of this work. As we have seen, theresearch community has proposed a diverse set ofinvariances for time series in the last two decades.However, there is one invariance that has been missed,complexity invariance (for the moment, the reader'snatural intuition of complex as having many peaks andvalues will suffice.) In many (perhaps most) domains,different classes may have widely varyingcomplexities. We can most readily see this on timeseries derived from shapes (as in Figure 6). Forexample, as we show in Figure 7, leaves range incomplexity from a simple pointed ellipse (ovate),shown bottom right, to a jagged-edged familiar mapleleaf (palmate), shown bottom left.

Figure 7: Examples of objects from a singledomain, which have different shape complexities

The reason why this matters is that pairs of complexobjects, even those which subjectively may seem verysimilar to the human eye, tend to be further apart undercurrent distance measures than pairs of simple objects.This fact introduces errors in nearest neighborclassification, because complex objects are incorrectlyassigned to a simpler class.We first illustrate the necessity for a complexity-invariant distance for time series using a synthetic dataset of figures with different shape complexities. Notethat our use of synthetic data here is for clarity; as wehinted at in Figure 7 and will show later, thecomplexity problem occurs in many real data sets.Figure 8 presents some geometric figures withincreasing shape complexity. Each figure is labeledwith its number of edges for reference. As in Figure 6,the two-dimensional shapes are converted to a single-dimensional “time” series by calculating the distancebetween the central point and the figure contour.Figure 8 also presents the z-normalized time seriescreated for each figure.

Figure 8: A set of geometric figures with increasingshape complexity and the respective “time” seriesextracted by calculating the distance between thecentral point and its contour

We calculated distances between every pair of timeseries in Figure 8 using Euclidean distance and presentthe results in Table 1.

Table 1: Euclidean distance matrix for thegeometric figures data set

4 5 6 7 10 12 24 32

4 1.000 1.122 1.231 1.181 1.048 1.155 1.1705 1.318 1.068 1.103 1.153 1.165 1.1806 1.088 1.097 1.103 1.186 1.2007 1.217 1.199 1.198 1.19110 1.263 1.195 1.21412 1.135 1.19924 1.19132

We can use the numbers in Table 1 (and thedendrogram created from them, as shown in Figure10.left) to explore the most similar shapes according

4 5 6 7

10 12 24 32