music genre classification with multilinear and sparse techniques … · 2009-10-19 ·...

179
Music genre classification with multilinear and sparse techniques Constantine Kotropoulos *† , Yannis Panagakis * , and Gonzalo R. Arce * Department of Informatics Aristotle University of Thessaloniki Thessaloniki 54124, GREECE Department of Electrical & Computer Engineering University of Delaware Newark, DE 19716, USA Greek Signal Processing Jam Athens, October 17th, 2009 Music genre classification with multilinear and sparse techniques 1/79

Upload: others

Post on 08-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Music genre classification with multilinear andsparse techniques

Constantine Kotropoulos∗†, Yannis Panagakis∗, andGonzalo R. Arce†

∗ Department of InformaticsAristotle University of ThessalonikiThessaloniki 54124, GREECE

† Department of Electrical & Computer EngineeringUniversity of DelawareNewark, DE 19716, USA

Greek Signal Processing JamAthens, October 17th, 2009

Music genre classification with multilinear and sparse techniques 1/79

Page 2: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Outline

1 Introduction

2 Auditory Spectro-temporal Modulations

3 Ensemble Discriminant Sparse Projections

4 Sparse Representation-based Classification (SRC)

5 Locality Preserving Non-negative Tensor Factorization within SRC

6 Outlook

Music genre classification with multilinear and sparse techniques 2/79

Page 3: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

1 Introduction

2 Auditory Spectro-temporal Modulations

3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment

4 Sparse Representation-based Classification (SRC)Experimental Assessment

5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment

6 OutlookComparison with the State of the ArtConclusions-Future Work

Music genre classification with multilinear and sparse techniques 3/79

Page 4: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

Music GenreThe most popular description of music content despite the lack ofa commonly agreed definition.Depends on cultural, artistic, or market factors, etc.

Problem DefinitionTo classify music recordings into distinguishable genres usinginformation extracted from the audio signal.

Music Genre Classification AlgorithmsModel the music signals by the long-term statistics of short-timefeatures, such as timbral texture, rhythmic, pitch content-related,or their combinations.

Music genre classification with multilinear and sparse techniques 4/79

Page 5: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

Music GenreThe most popular description of music content despite the lack ofa commonly agreed definition.Depends on cultural, artistic, or market factors, etc.

Problem DefinitionTo classify music recordings into distinguishable genres usinginformation extracted from the audio signal.

Music Genre Classification AlgorithmsModel the music signals by the long-term statistics of short-timefeatures, such as timbral texture, rhythmic, pitch content-related,or their combinations.

Music genre classification with multilinear and sparse techniques 4/79

Page 6: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

Music GenreThe most popular description of music content despite the lack ofa commonly agreed definition.Depends on cultural, artistic, or market factors, etc.

Problem DefinitionTo classify music recordings into distinguishable genres usinginformation extracted from the audio signal.

Music Genre Classification AlgorithmsModel the music signals by the long-term statistics of short-timefeatures, such as timbral texture, rhythmic, pitch content-related,or their combinations.

Music genre classification with multilinear and sparse techniques 4/79

Page 7: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

Music GenreThe most popular description of music content despite the lack ofa commonly agreed definition.Depends on cultural, artistic, or market factors, etc.

Problem DefinitionTo classify music recordings into distinguishable genres usinginformation extracted from the audio signal.

Music Genre Classification AlgorithmsModel the music signals by the long-term statistics of short-timefeatures, such as timbral texture, rhythmic, pitch content-related,or their combinations.

Music genre classification with multilinear and sparse techniques 4/79

Page 8: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

Music GenreThe most popular description of music content despite the lack ofa commonly agreed definition.Depends on cultural, artistic, or market factors, etc.

Problem DefinitionTo classify music recordings into distinguishable genres usinginformation extracted from the audio signal.

Music Genre Classification AlgorithmsModel the music signals by the long-term statistics of short-timefeatures, such as timbral texture, rhythmic, pitch content-related,or their combinations.

Music genre classification with multilinear and sparse techniques 4/79

Page 9: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

MotivationThe appealing properties of slow temporal and spectro-temporalmodulations from the human perceptual point of viewa;The strong theoretical foundations of sparse representationsbc.

aK. Wang and S. A. Shamma, “Spectral shape analysis in the central auditory system,” IEEE Trans. Speech and Audio

Processing, vol. 3, no. 5, pp. 382-396, 1995.b

E. J. Candes, J. Romberg, and T. Tao,“Robust uncertainty principles: Exact signal reconstruction from highly incompletefrequency information,” IEEE Trans. Information Theory, vol. 52, no. 2, pp. 489-509, February 2006.

cD. L. Donoho, “Compressed sensing,” IEEE Trans. Information Theory, vol. 52, no. 4, pp. 1289-1306, April 2006.

Music genre classification with multilinear and sparse techniques 5/79

Page 10: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

MotivationThe appealing properties of slow temporal and spectro-temporalmodulations from the human perceptual point of viewa;The strong theoretical foundations of sparse representationsbc.

aK. Wang and S. A. Shamma, “Spectral shape analysis in the central auditory system,” IEEE Trans. Speech and Audio

Processing, vol. 3, no. 5, pp. 382-396, 1995.b

E. J. Candes, J. Romberg, and T. Tao,“Robust uncertainty principles: Exact signal reconstruction from highly incompletefrequency information,” IEEE Trans. Information Theory, vol. 52, no. 2, pp. 489-509, February 2006.

cD. L. Donoho, “Compressed sensing,” IEEE Trans. Information Theory, vol. 52, no. 4, pp. 1289-1306, April 2006.

Music genre classification with multilinear and sparse techniques 5/79

Page 11: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

First approach: Ensemble Discriminant Sparse Projections (1)Each music recording is represented by its slow temporalmodulations, the so-called auditory temporal modulationrepresentation.Given a training set of auditory temporal modulations, thedictionary, that best represents each member of the training setunder sparsity constraints, is extracted by means of the K-SVDalgorithma.

aM. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse

representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311-4322, Nov. 2006.

Music genre classification with multilinear and sparse techniques 6/79

Page 12: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

First approach: Ensemble Discriminant Sparse Projections (1)Each music recording is represented by its slow temporalmodulations, the so-called auditory temporal modulationrepresentation.Given a training set of auditory temporal modulations, thedictionary, that best represents each member of the training setunder sparsity constraints, is extracted by means of the K-SVDalgorithma.

aM. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse

representation,” IEEE Trans. Signal Processing, vol. 54, no. 11, pp. 4311-4322, Nov. 2006.

Music genre classification with multilinear and sparse techniques 6/79

Page 13: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

First approach: Ensemble Discriminant Sparse Projections (2)Discriminant Sparse Projections: The most discriminating features(MDF)a are extracted by applying dual linear discriminant analysis(LDA)b to the two principal subspaces of the within-class andbetween-class covariance matrices of the sparse coefficientvectors.Classifier Ensemble: Majority voting is applied to the decisionstaken by multiple individual dual LDA classifiers.

aD. L. Swets and J. Weng, “Using discriminant eigenfeatures for image retrieval,” IEEE Trans. Pattern Analysis and Machine

Intelligence, vol. 18, no. 8, pp. 831-836, August 1996.b

X. Wang and X. Tang, “Dual space linear discriminant analysis for face recognition,” in Proc. IEEE Computer Society Conf.CVPR, 2004, vol. 2, pp. 564-569.

Music genre classification with multilinear and sparse techniques 7/79

Page 14: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

First approach: Ensemble Discriminant Sparse Projections (2)Discriminant Sparse Projections: The most discriminating features(MDF)a are extracted by applying dual linear discriminant analysis(LDA)b to the two principal subspaces of the within-class andbetween-class covariance matrices of the sparse coefficientvectors.Classifier Ensemble: Majority voting is applied to the decisionstaken by multiple individual dual LDA classifiers.

aD. L. Swets and J. Weng, “Using discriminant eigenfeatures for image retrieval,” IEEE Trans. Pattern Analysis and Machine

Intelligence, vol. 18, no. 8, pp. 831-836, August 1996.b

X. Wang and X. Tang, “Dual space linear discriminant analysis for face recognition,” in Proc. IEEE Computer Society Conf.CVPR, 2004, vol. 2, pp. 564-569.

Music genre classification with multilinear and sparse techniques 7/79

Page 15: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

Second approach: Sparse Representation-based ClassifierEach music recording is again represented by its slow auditorytemporal modulations.The vectorized training auditory temporal modulations form adictionary of basis signals for music genres.Any test representation is expressed as a compact linearcombination of the dictionary atoms for the genre, where itbelongs to.Classification is performed by sparse representation-basedclassifier (SRC)a.

aJ. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma “Robust face recognition via sparse representation,” IEEE Trans.

Pattern Analysis and Machine Intelligence vol. 31, no. 2, pp. 210-227, Feb. 2009.

Music genre classification with multilinear and sparse techniques 8/79

Page 16: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

Second approach: Sparse Representation-based ClassifierEach music recording is again represented by its slow auditorytemporal modulations.The vectorized training auditory temporal modulations form adictionary of basis signals for music genres.Any test representation is expressed as a compact linearcombination of the dictionary atoms for the genre, where itbelongs to.Classification is performed by sparse representation-basedclassifier (SRC)a.

aJ. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma “Robust face recognition via sparse representation,” IEEE Trans.

Pattern Analysis and Machine Intelligence vol. 31, no. 2, pp. 210-227, Feb. 2009.

Music genre classification with multilinear and sparse techniques 8/79

Page 17: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

Second approach: Sparse Representation-based ClassifierEach music recording is again represented by its slow auditorytemporal modulations.The vectorized training auditory temporal modulations form adictionary of basis signals for music genres.Any test representation is expressed as a compact linearcombination of the dictionary atoms for the genre, where itbelongs to.Classification is performed by sparse representation-basedclassifier (SRC)a.

aJ. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma “Robust face recognition via sparse representation,” IEEE Trans.

Pattern Analysis and Machine Intelligence vol. 31, no. 2, pp. 210-227, Feb. 2009.

Music genre classification with multilinear and sparse techniques 8/79

Page 18: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

Second approach: Sparse Representation-based ClassifierEach music recording is again represented by its slow auditorytemporal modulations.The vectorized training auditory temporal modulations form adictionary of basis signals for music genres.Any test representation is expressed as a compact linearcombination of the dictionary atoms for the genre, where itbelongs to.Classification is performed by sparse representation-basedclassifier (SRC)a.

aJ. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma “Robust face recognition via sparse representation,” IEEE Trans.

Pattern Analysis and Machine Intelligence vol. 31, no. 2, pp. 210-227, Feb. 2009.

Music genre classification with multilinear and sparse techniques 8/79

Page 19: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

Third approach: Locality preserving non-negative tensorfactorization within a sparse representation-based classifier (1)

Cortical representation: A given music recording is mapped to athree-dimensional (3D) representation of its slow spectral andtemporal modulationsa.Each cortical representation is modeled as a sparse weightedsum of the basis elements (atoms) of an overcomplete dictionary,which stems from the cortical representations associated totraining music recordings whose genre is known.

aI. Panagakis, E. Benetos, and C. Kotropoulos: “Music genre classification: A multilinear approach,” in Proc. 7th Int. Symp.

Music Information Retrieval,Philadelphia, USA, 2008.

Music genre classification with multilinear and sparse techniques 9/79

Page 20: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

Third approach: Locality preserving non-negative tensorfactorization within a sparse representation-based classifier (1)

Cortical representation: A given music recording is mapped to athree-dimensional (3D) representation of its slow spectral andtemporal modulationsa.Each cortical representation is modeled as a sparse weightedsum of the basis elements (atoms) of an overcomplete dictionary,which stems from the cortical representations associated totraining music recordings whose genre is known.

aI. Panagakis, E. Benetos, and C. Kotropoulos: “Music genre classification: A multilinear approach,” in Proc. 7th Int. Symp.

Music Information Retrieval,Philadelphia, USA, 2008.

Music genre classification with multilinear and sparse techniques 9/79

Page 21: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

Third approach: Locality preserving non-negative tensorfactorization within a sparse representation-based classifier (2)

By vectorizing a typical 3D cortical representation of 6 scales, 10rates, and 128 frequency bands, one obtains a vector of 7680dimensions.Multilinear dimensionality reduction techniques do not guaranteethat two data points, which are close in the intrinsic geometry ofthe original space, are also close in the data space aftermultilinear dimensionality reduction.A novel algorithm is proposed, where the geometrical informationof the original data space is incorporated into the objectivefunction optimized by non-negative tensor factorization (NTF).

Music genre classification with multilinear and sparse techniques 10/79

Page 22: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

Third approach: Locality preserving non-negative tensorfactorization within a sparse representation-based classifier (2)

By vectorizing a typical 3D cortical representation of 6 scales, 10rates, and 128 frequency bands, one obtains a vector of 7680dimensions.Multilinear dimensionality reduction techniques do not guaranteethat two data points, which are close in the intrinsic geometry ofthe original space, are also close in the data space aftermultilinear dimensionality reduction.A novel algorithm is proposed, where the geometrical informationof the original data space is incorporated into the objectivefunction optimized by non-negative tensor factorization (NTF).

Music genre classification with multilinear and sparse techniques 10/79

Page 23: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Introduction

Third approach: Locality preserving non-negative tensorfactorization within a sparse representation-based classifier (2)

By vectorizing a typical 3D cortical representation of 6 scales, 10rates, and 128 frequency bands, one obtains a vector of 7680dimensions.Multilinear dimensionality reduction techniques do not guaranteethat two data points, which are close in the intrinsic geometry ofthe original space, are also close in the data space aftermultilinear dimensionality reduction.A novel algorithm is proposed, where the geometrical informationof the original data space is incorporated into the objectivefunction optimized by non-negative tensor factorization (NTF).

Music genre classification with multilinear and sparse techniques 10/79

Page 24: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

1 Introduction

2 Auditory Spectro-temporal Modulations

3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment

4 Sparse Representation-based Classification (SRC)Experimental Assessment

5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment

6 OutlookComparison with the State of the ArtConclusions-Future Work

Music genre classification with multilinear and sparse techniques 11/79

Page 25: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Auditory Spectro-temporal Modulations

Computational Auditory ModelThe computational auditory model is inspired by psychoacousticaland neurophysiological investigations in the early and centralstages of the human auditory system.

Earl

yau

dit

ory

mo

del

Cen

tralau

dit

ory

mo

delAuditory Spectrogram

Auditory Temporal Modulations

Auditory Spectro-Temporal Modulations(Cortical Representation)

Music genre classification with multilinear and sparse techniques 12/79

Page 26: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Auditory Spectro-temporal Modulations

Computational Auditory ModelThe computational auditory model is inspired by psychoacousticaland neurophysiological investigations in the early and centralstages of the human auditory system.

Earl

yau

dit

ory

mo

del

Cen

tralau

dit

ory

mo

delAuditory Spectrogram

Auditory Temporal Modulations

Auditory Spectro-Temporal Modulations(Cortical Representation)

Music genre classification with multilinear and sparse techniques 12/79

Page 27: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Auditory Spectro-temporal Modulations

Early Auditory SystemAuditory Spectrogram: time-frequency distribution of energy alonga tonotopic (logarithmic frequency) axis.

Early auditory model

Auditory Spectrogram

Music genre classification with multilinear and sparse techniques 13/79

Page 28: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Auditory Spectro-temporal Modulations

Central Auditory System - Temporal Modulations

Auditory Spectrogram

ω(H

z)

ω(H

z)

Auditory Temporal Modulations

Music genre classification with multilinear and sparse techniques 14/79

Page 29: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Auditory Spectro-temporal Modulations

Temporal Modulation ParametersTemporal Modulations: ω ∈ 2,4,8,16,32,64,128,256 (Hz)96 frequency channels covering 4 octaves.

Music genre classification with multilinear and sparse techniques 15/79

Page 30: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Auditory Spectro-temporal Modulations

Temporal Modulation ParametersTemporal Modulations: ω ∈ 2,4,8,16,32,64,128,256 (Hz)96 frequency channels covering 4 octaves.

Music genre classification with multilinear and sparse techniques 15/79

Page 31: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Auditory Spectro-temporal Modulations

Auditory Temporal Modulations across 10 Music Genres

Blues Classical Country Disco Hiphop

Jazz Metal RockPop Reggae

Music genre classification with multilinear and sparse techniques 16/79

Page 32: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Auditory Spectro-temporal Modulations

Central Auditory System - Spectro-temporal Modulations

Auditory Spectrogram Auditory Spectro-Temporal Modulations

ωH

z)(

Ω(c

/o)

Music genre classification with multilinear and sparse techniques 17/79

Page 33: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Auditory Spectro-temporal Modulations

Cortical RepresentationA bank of 2D spectrotemporal filters is applied to the auditoryspectrogram, which are selective to different spectrotemporalmodulation parameters ranging from slow to fast rates temporally(in Hz) and from narrow to broad scales spectrally (inCycles/Octave).Each point in the auditory spectrogram has a 2D (hidden)rate-scale representation, which indicates the modulation strengthfor all rates and scales for that channel and time instant.

Music genre classification with multilinear and sparse techniques 18/79

Page 34: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Auditory Spectro-temporal Modulations

Cortical RepresentationA bank of 2D spectrotemporal filters is applied to the auditoryspectrogram, which are selective to different spectrotemporalmodulation parameters ranging from slow to fast rates temporally(in Hz) and from narrow to broad scales spectrally (inCycles/Octave).Each point in the auditory spectrogram has a 2D (hidden)rate-scale representation, which indicates the modulation strengthfor all rates and scales for that channel and time instant.

Music genre classification with multilinear and sparse techniques 18/79

Page 35: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Auditory Spectro-temporal Modulations

Temporal modulations - (Hzω )

FrequencyChannels- f

Spectralmodulations - (Ω c/o)

Cortical Representation ParametersSpectral Modulations: Ω ∈ 0.25, 0.5, 1,2,4,8 (Cycles/Octave).Temporal Modulations: Positive and negative ω ∈ 2,4,8,16,32(Hz).128 frequency channels covering 51

3 octaves.

Music genre classification with multilinear and sparse techniques 19/79

Page 36: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Auditory Spectro-temporal Modulations

Temporal modulations - (Hzω )

FrequencyChannels- f

Spectralmodulations - (Ω c/o)

Cortical Representation ParametersSpectral Modulations: Ω ∈ 0.25, 0.5, 1,2,4,8 (Cycles/Octave).Temporal Modulations: Positive and negative ω ∈ 2,4,8,16,32(Hz).128 frequency channels covering 51

3 octaves.

Music genre classification with multilinear and sparse techniques 19/79

Page 37: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Auditory Spectro-temporal Modulations

Temporal modulations - (Hzω )

FrequencyChannels- f

Spectralmodulations - (Ω c/o)

Cortical Representation ParametersSpectral Modulations: Ω ∈ 0.25, 0.5, 1,2,4,8 (Cycles/Octave).Temporal Modulations: Positive and negative ω ∈ 2,4,8,16,32(Hz).128 frequency channels covering 51

3 octaves.

Music genre classification with multilinear and sparse techniques 19/79

Page 38: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

1 Introduction

2 Auditory Spectro-temporal Modulations

3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment

4 Sparse Representation-based Classification (SRC)Experimental Assessment

5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment

6 OutlookComparison with the State of the ArtConclusions-Future Work

Music genre classification with multilinear and sparse techniques 20/79

Page 39: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

Mathematical Modeling of Auditory Temporal ModulationsThe auditory temporal modulations of a set of music recordings(i.e. a dataset) are represented by a 3rd-order nonnegative real-valued tensor Y ∈ RNω×Nf×Ns

+ , where Nω = 8, Nf = 96, and Nsdenotes the number of music recordings.

Let Y(3) ∈ RNs×(Nf ·Nω)+ be the 3rd mode matrix unfolding of Y.

Data matrix : Y = YT(3) =

[y1|y2| · · · |yNs

], where T denotes matrix

transposition.yj ∈ R768

+ , j = 1,2, . . . ,Ns, is downsampled to yield a vector of sizeM ∈ 12,48,85,192.

Music genre classification with multilinear and sparse techniques 21/79

Page 40: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

Mathematical Modeling of Auditory Temporal ModulationsThe auditory temporal modulations of a set of music recordings(i.e. a dataset) are represented by a 3rd-order nonnegative real-valued tensor Y ∈ RNω×Nf×Ns

+ , where Nω = 8, Nf = 96, and Nsdenotes the number of music recordings.

Let Y(3) ∈ RNs×(Nf ·Nω)+ be the 3rd mode matrix unfolding of Y.

Data matrix : Y = YT(3) =

[y1|y2| · · · |yNs

], where T denotes matrix

transposition.yj ∈ R768

+ , j = 1,2, . . . ,Ns, is downsampled to yield a vector of sizeM ∈ 12,48,85,192.

Music genre classification with multilinear and sparse techniques 21/79

Page 41: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

Mathematical Modeling of Auditory Temporal ModulationsThe auditory temporal modulations of a set of music recordings(i.e. a dataset) are represented by a 3rd-order nonnegative real-valued tensor Y ∈ RNω×Nf×Ns

+ , where Nω = 8, Nf = 96, and Nsdenotes the number of music recordings.

Let Y(3) ∈ RNs×(Nf ·Nω)+ be the 3rd mode matrix unfolding of Y.

Data matrix : Y = YT(3) =

[y1|y2| · · · |yNs

], where T denotes matrix

transposition.yj ∈ R768

+ , j = 1,2, . . . ,Ns, is downsampled to yield a vector of sizeM ∈ 12,48,85,192.

Music genre classification with multilinear and sparse techniques 21/79

Page 42: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

Mathematical Modeling of Auditory Temporal ModulationsThe auditory temporal modulations of a set of music recordings(i.e. a dataset) are represented by a 3rd-order nonnegative real-valued tensor Y ∈ RNω×Nf×Ns

+ , where Nω = 8, Nf = 96, and Nsdenotes the number of music recordings.

Let Y(3) ∈ RNs×(Nf ·Nω)+ be the 3rd mode matrix unfolding of Y.

Data matrix : Y = YT(3) =

[y1|y2| · · · |yNs

], where T denotes matrix

transposition.yj ∈ R768

+ , j = 1,2, . . . ,Ns, is downsampled to yield a vector of sizeM ∈ 12,48,85,192.

Music genre classification with multilinear and sparse techniques 21/79

Page 43: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

Sparse ApproximationA downsampled representation of auditory temporal modulationsyj ∈ RM

+ , j = 1,2, . . . ,Ns, admits a sparse approximation over adictionary D ∈ RM×K , when yj = D xj or ||yj − Dxj ||p ≤ γ, where|| ||p denotes the `p vector norm for p = 1,2 and∞.To learn D with a fixed number of atoms K , the K-SVD is used.K-SVD iteratively alternates between sparse coding of the trainingsamples based on the current dictionary and dictionary updatingto better fit the training set.

Music genre classification with multilinear and sparse techniques 22/79

Page 44: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

Sparse ApproximationA downsampled representation of auditory temporal modulationsyj ∈ RM

+ , j = 1,2, . . . ,Ns, admits a sparse approximation over adictionary D ∈ RM×K , when yj = D xj or ||yj − Dxj ||p ≤ γ, where|| ||p denotes the `p vector norm for p = 1,2 and∞.To learn D with a fixed number of atoms K , the K-SVD is used.K-SVD iteratively alternates between sparse coding of the trainingsamples based on the current dictionary and dictionary updatingto better fit the training set.

Music genre classification with multilinear and sparse techniques 22/79

Page 45: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

Sparse ApproximationA downsampled representation of auditory temporal modulationsyj ∈ RM

+ , j = 1,2, . . . ,Ns, admits a sparse approximation over adictionary D ∈ RM×K , when yj = D xj or ||yj − Dxj ||p ≤ γ, where|| ||p denotes the `p vector norm for p = 1,2 and∞.To learn D with a fixed number of atoms K , the K-SVD is used.K-SVD iteratively alternates between sparse coding of the trainingsamples based on the current dictionary and dictionary updatingto better fit the training set.

Music genre classification with multilinear and sparse techniques 22/79

Page 46: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

K-SVD (1)

The following problem is solved minxj ,D∑Nst

j=1 ||yj − Dxj ||22 subjectto ||xj ||0 ≤ L, where Nst < Ns is the number of training samplesand || ||0 counts the number of nonzero vector elements.For a given D with K atoms, where M ≤ K ≤ Nst , the optimalsparse coefficient vector of the j th training sample is found bysolving x∗j , argminx ||yj − D x||22 subject to ||x||0 ≤ L with anypursuit algorithm, e.g. OMPa or FOCUSSb.Let Y =

[y1|y2| · · · |yNst

], where yj is a compact notation for y:j .

aG. Davis, S. Mallat, and Z. Zhang, “Adaptive time-frequency decompositions,” Optical Engineering, vol. 33, no. 7, pp.

2183-2191, July 1997.b

I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted normminimization algorithm,” IEEE Trans. Signal Processing, vol. 45, no. 3, pp. 600-616, March 1997.

Music genre classification with multilinear and sparse techniques 23/79

Page 47: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

K-SVD (1)

The following problem is solved minxj ,D∑Nst

j=1 ||yj − Dxj ||22 subjectto ||xj ||0 ≤ L, where Nst < Ns is the number of training samplesand || ||0 counts the number of nonzero vector elements.For a given D with K atoms, where M ≤ K ≤ Nst , the optimalsparse coefficient vector of the j th training sample is found bysolving x∗j , argminx ||yj − D x||22 subject to ||x||0 ≤ L with anypursuit algorithm, e.g. OMPa or FOCUSSb.Let Y =

[y1|y2| · · · |yNst

], where yj is a compact notation for y:j .

aG. Davis, S. Mallat, and Z. Zhang, “Adaptive time-frequency decompositions,” Optical Engineering, vol. 33, no. 7, pp.

2183-2191, July 1997.b

I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted normminimization algorithm,” IEEE Trans. Signal Processing, vol. 45, no. 3, pp. 600-616, March 1997.

Music genre classification with multilinear and sparse techniques 23/79

Page 48: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

K-SVD (1)

The following problem is solved minxj ,D∑Nst

j=1 ||yj − Dxj ||22 subjectto ||xj ||0 ≤ L, where Nst < Ns is the number of training samplesand || ||0 counts the number of nonzero vector elements.For a given D with K atoms, where M ≤ K ≤ Nst , the optimalsparse coefficient vector of the j th training sample is found bysolving x∗j , argminx ||yj − D x||22 subject to ||x||0 ≤ L with anypursuit algorithm, e.g. OMPa or FOCUSSb.Let Y =

[y1|y2| · · · |yNst

], where yj is a compact notation for y:j .

aG. Davis, S. Mallat, and Z. Zhang, “Adaptive time-frequency decompositions,” Optical Engineering, vol. 33, no. 7, pp.

2183-2191, July 1997.b

I. F. Gorodnitsky and B. D. Rao, “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted normminimization algorithm,” IEEE Trans. Signal Processing, vol. 45, no. 3, pp. 600-616, March 1997.

Music genre classification with multilinear and sparse techniques 23/79

Page 49: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

K-SVD (2)

Let also xTk : =

[xk1, xk2, . . . , xkNst

]be the k th row of X ∈ RK×Nst .

If || ||2F denotes the Frobenius norm of a matrix,

||Y− DX||2F = ||(

Y−K∑κ=1κ 6=k

dκxTκ:

)︸ ︷︷ ︸

Ek

−dkxTk :||2F = ||Ek − dkxT

k :||2F .

Music genre classification with multilinear and sparse techniques 24/79

Page 50: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

K-SVD (2)

Let also xTk : =

[xk1, xk2, . . . , xkNst

]be the k th row of X ∈ RK×Nst .

If || ||2F denotes the Frobenius norm of a matrix,

||Y− DX||2F = ||(

Y−K∑κ=1κ 6=k

dκxTκ:

)︸ ︷︷ ︸

Ek

−dkxTk :||2F = ||Ek − dkxT

k :||2F .

Music genre classification with multilinear and sparse techniques 24/79

Page 51: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

K-SVD (3)Let Jk be the set of training recordings, which include the k th atomin their approximation, i.e. Jk = j | xkj 6= 0, j = 1,2, . . . ,Nst.Let |Jk | denote the cardinality of the set Jk .Let Ωk be the Nst × |Jk | indicator matrix with ones for (j , ξ(j)) withj ∈ Jk and ξ(j) ∈ [1, |Jk |] be the position of j in Jk .

||Ek Ωk︸ ︷︷ ︸ER

k

−dk xTk :Ωk︸ ︷︷ ︸[xR

k :]T

||2F = ||ERk − dk [xR

k :]T ||2F .

Music genre classification with multilinear and sparse techniques 25/79

Page 52: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

K-SVD (3)Let Jk be the set of training recordings, which include the k th atomin their approximation, i.e. Jk = j | xkj 6= 0, j = 1,2, . . . ,Nst.Let |Jk | denote the cardinality of the set Jk .Let Ωk be the Nst × |Jk | indicator matrix with ones for (j , ξ(j)) withj ∈ Jk and ξ(j) ∈ [1, |Jk |] be the position of j in Jk .

||Ek Ωk︸ ︷︷ ︸ER

k

−dk xTk :Ωk︸ ︷︷ ︸[xR

k :]T

||2F = ||ERk − dk [xR

k :]T ||2F .

Music genre classification with multilinear and sparse techniques 25/79

Page 53: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

K-SVD (3)Let Jk be the set of training recordings, which include the k th atomin their approximation, i.e. Jk = j | xkj 6= 0, j = 1,2, . . . ,Nst.Let |Jk | denote the cardinality of the set Jk .Let Ωk be the Nst × |Jk | indicator matrix with ones for (j , ξ(j)) withj ∈ Jk and ξ(j) ∈ [1, |Jk |] be the position of j in Jk .

||Ek Ωk︸ ︷︷ ︸ER

k

−dk xTk :Ωk︸ ︷︷ ︸[xR

k :]T

||2F = ||ERk − dk [xR

k :]T ||2F .

Music genre classification with multilinear and sparse techniques 25/79

Page 54: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

K-SVD (4)

Let ERk = Υ∆VT be the Singular Value Decomposition (SVD) of

ERk .

The updated k th dictionary atom is first column of matrix Υ;The updated coefficient vector xR

k : corresponds to the first columnof matrix V multiplied by ∆11.The projections to the principal component analysis subspace,that precede LDA (e.g. in face recognitiona) are replaced by thesparse approximations over the overcomplete dictionary D∗.

aP. N. Belhumeur, J. Hespanda, and D. Kriegman, “Eigenfaces vs. fisherfaces: Recognition using class specific linear

projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.

Music genre classification with multilinear and sparse techniques 26/79

Page 55: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

K-SVD (4)

Let ERk = Υ∆VT be the Singular Value Decomposition (SVD) of

ERk .

The updated k th dictionary atom is first column of matrix Υ;The updated coefficient vector xR

k : corresponds to the first columnof matrix V multiplied by ∆11.The projections to the principal component analysis subspace,that precede LDA (e.g. in face recognitiona) are replaced by thesparse approximations over the overcomplete dictionary D∗.

aP. N. Belhumeur, J. Hespanda, and D. Kriegman, “Eigenfaces vs. fisherfaces: Recognition using class specific linear

projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.

Music genre classification with multilinear and sparse techniques 26/79

Page 56: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

K-SVD (4)

Let ERk = Υ∆VT be the Singular Value Decomposition (SVD) of

ERk .

The updated k th dictionary atom is first column of matrix Υ;The updated coefficient vector xR

k : corresponds to the first columnof matrix V multiplied by ∆11.The projections to the principal component analysis subspace,that precede LDA (e.g. in face recognitiona) are replaced by thesparse approximations over the overcomplete dictionary D∗.

aP. N. Belhumeur, J. Hespanda, and D. Kriegman, “Eigenfaces vs. fisherfaces: Recognition using class specific linear

projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.

Music genre classification with multilinear and sparse techniques 26/79

Page 57: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Overcomplete Dictionaries for Sparse Representations

K-SVD (4)

Let ERk = Υ∆VT be the Singular Value Decomposition (SVD) of

ERk .

The updated k th dictionary atom is first column of matrix Υ;The updated coefficient vector xR

k : corresponds to the first columnof matrix V multiplied by ∆11.The projections to the principal component analysis subspace,that precede LDA (e.g. in face recognitiona) are replaced by thesparse approximations over the overcomplete dictionary D∗.

aP. N. Belhumeur, J. Hespanda, and D. Kriegman, “Eigenfaces vs. fisherfaces: Recognition using class specific linear

projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997.

Music genre classification with multilinear and sparse techniques 26/79

Page 58: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

1 Introduction

2 Auditory Spectro-temporal Modulations

3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment

4 Sparse Representation-based Classification (SRC)Experimental Assessment

5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment

6 OutlookComparison with the State of the ArtConclusions-Future Work

Music genre classification with multilinear and sparse techniques 27/79

Page 59: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

DefinitionsLet the training set contain Ng genres and each genre class Yihave ni samples whose sample mean vector is denoted by mi ,i = 1,2, . . . ,Ng .The within-class sample covariance matrix is defined asSw = 1

Nst

∑Ngi=1∑

yj∈Yi(yj −mi) (yj −mi)

T .

The between-class sample covariance matrix is given bySb = 1

Nst

∑Ngi=1 ni (mi −m) (mi −m)T , where m is the gross

sample mean vector of the whole training set.

Music genre classification with multilinear and sparse techniques 28/79

Page 60: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

DefinitionsLet the training set contain Ng genres and each genre class Yihave ni samples whose sample mean vector is denoted by mi ,i = 1,2, . . . ,Ng .The within-class sample covariance matrix is defined asSw = 1

Nst

∑Ngi=1∑

yj∈Yi(yj −mi) (yj −mi)

T .

The between-class sample covariance matrix is given bySb = 1

Nst

∑Ngi=1 ni (mi −m) (mi −m)T , where m is the gross

sample mean vector of the whole training set.

Music genre classification with multilinear and sparse techniques 28/79

Page 61: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

DefinitionsLet the training set contain Ng genres and each genre class Yihave ni samples whose sample mean vector is denoted by mi ,i = 1,2, . . . ,Ng .The within-class sample covariance matrix is defined asSw = 1

Nst

∑Ngi=1∑

yj∈Yi(yj −mi) (yj −mi)

T .

The between-class sample covariance matrix is given bySb = 1

Nst

∑Ngi=1 ni (mi −m) (mi −m)T , where m is the gross

sample mean vector of the whole training set.

Music genre classification with multilinear and sparse techniques 28/79

Page 62: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Discriminant Sparse Projections (1)The most discriminant features (MDFs) are obtained by projectingthe training samples on the columns of the matrixW∗ = argmaxW

|WT Sb W||WT Sw W| .

We propose to apply LDA in the space of the sparserepresentations defined by the matrix D∗.Let mi = 1

ni

∑j: yj∈Yi

xj be the sample mean vector of the sparsecoefficients associated to the training samples that belong to thei th class.

Music genre classification with multilinear and sparse techniques 29/79

Page 63: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Discriminant Sparse Projections (1)The most discriminant features (MDFs) are obtained by projectingthe training samples on the columns of the matrixW∗ = argmaxW

|WT Sb W||WT Sw W| .

We propose to apply LDA in the space of the sparserepresentations defined by the matrix D∗.Let mi = 1

ni

∑j: yj∈Yi

xj be the sample mean vector of the sparsecoefficients associated to the training samples that belong to thei th class.

Music genre classification with multilinear and sparse techniques 29/79

Page 64: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Discriminant Sparse Projections (1)The most discriminant features (MDFs) are obtained by projectingthe training samples on the columns of the matrixW∗ = argmaxW

|WT Sb W||WT Sw W| .

We propose to apply LDA in the space of the sparserepresentations defined by the matrix D∗.Let mi = 1

ni

∑j: yj∈Yi

xj be the sample mean vector of the sparsecoefficients associated to the training samples that belong to thei th class.

Music genre classification with multilinear and sparse techniques 29/79

Page 65: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Discriminant Sparse Projections (2)

Sw ≈ D∗∑Ng

i=1∑

yj∈Yi

(xj − mi

) (xj − mi

)T[D∗]T = D∗ Sw [D∗]T ,

where Sw is the within-class sample covariance matrix of thesparse coefficients.Sb ≈ D∗ Sb [D∗]T , where Sb is the between-class samplecovariance matrix of the sparse coefficients.Let W , [D∗]T W. The optimization problem can be recast as

maxW|WT Sb W||WT Sw W|

.

Music genre classification with multilinear and sparse techniques 30/79

Page 66: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Discriminant Sparse Projections (2)

Sw ≈ D∗∑Ng

i=1∑

yj∈Yi

(xj − mi

) (xj − mi

)T[D∗]T = D∗ Sw [D∗]T ,

where Sw is the within-class sample covariance matrix of thesparse coefficients.Sb ≈ D∗ Sb [D∗]T , where Sb is the between-class samplecovariance matrix of the sparse coefficients.Let W , [D∗]T W. The optimization problem can be recast as

maxW|WT Sb W||WT Sw W|

.

Music genre classification with multilinear and sparse techniques 30/79

Page 67: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Discriminant Sparse Projections (2)

Sw ≈ D∗∑Ng

i=1∑

yj∈Yi

(xj − mi

) (xj − mi

)T[D∗]T = D∗ Sw [D∗]T ,

where Sw is the within-class sample covariance matrix of thesparse coefficients.Sb ≈ D∗ Sb [D∗]T , where Sb is the between-class samplecovariance matrix of the sparse coefficients.Let W , [D∗]T W. The optimization problem can be recast as

maxW|WT Sb W||WT Sw W|

.

Music genre classification with multilinear and sparse techniques 30/79

Page 68: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Discriminant Sparse Projections (3)

If W∗ is the solution of the optimization problem, the solution of theoriginal LDA problem is then W∗ =

[[D∗]†

]T W∗,

which suggests that zj = [W∗]T yj = [W∗]T [D∗]† yj = [W∗]T xj , i.e.LDA is applied to the coefficients of the sparse representation.Most expressive features (MEFs): xj ,MDFs: zj .Discriminant Sparse Projection: Cascade of sparserepresentation and LDA.

Music genre classification with multilinear and sparse techniques 31/79

Page 69: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Discriminant Sparse Projections (3)

If W∗ is the solution of the optimization problem, the solution of theoriginal LDA problem is then W∗ =

[[D∗]†

]T W∗,

which suggests that zj = [W∗]T yj = [W∗]T [D∗]† yj = [W∗]T xj , i.e.LDA is applied to the coefficients of the sparse representation.Most expressive features (MEFs): xj ,MDFs: zj .Discriminant Sparse Projection: Cascade of sparserepresentation and LDA.

Music genre classification with multilinear and sparse techniques 31/79

Page 70: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Discriminant Sparse Projections (3)

If W∗ is the solution of the optimization problem, the solution of theoriginal LDA problem is then W∗ =

[[D∗]†

]T W∗,

which suggests that zj = [W∗]T yj = [W∗]T [D∗]† yj = [W∗]T xj , i.e.LDA is applied to the coefficients of the sparse representation.Most expressive features (MEFs): xj ,MDFs: zj .Discriminant Sparse Projection: Cascade of sparserepresentation and LDA.

Music genre classification with multilinear and sparse techniques 31/79

Page 71: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Discriminant Sparse Projections (3)

If W∗ is the solution of the optimization problem, the solution of theoriginal LDA problem is then W∗ =

[[D∗]†

]T W∗,

which suggests that zj = [W∗]T yj = [W∗]T [D∗]† yj = [W∗]T xj , i.e.LDA is applied to the coefficients of the sparse representation.Most expressive features (MEFs): xj ,MDFs: zj .Discriminant Sparse Projection: Cascade of sparserepresentation and LDA.

Music genre classification with multilinear and sparse techniques 31/79

Page 72: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Discriminant Sparse Projections (3)

If W∗ is the solution of the optimization problem, the solution of theoriginal LDA problem is then W∗ =

[[D∗]†

]T W∗,

which suggests that zj = [W∗]T yj = [W∗]T [D∗]† yj = [W∗]T xj , i.e.LDA is applied to the coefficients of the sparse representation.Most expressive features (MEFs): xj ,MDFs: zj .Discriminant Sparse Projection: Cascade of sparserepresentation and LDA.

Music genre classification with multilinear and sparse techniques 31/79

Page 73: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Dual LDA: Principal subspace of Sw

Sw is a real symmetric matrix decomposable as ΦwGwΦTw . It has

rank ρw ≤ min(K ,Nst − Ng) ≤ K . It is assumed that Gw is thediagonal matrix confined to the ρw largest eigenvalues of Sw . Theprincipal subspace of Sw is defined by the eigenvectors, which areassociated to its ρw largest eigenvalues Φw = [φ1|φ2| · · · |φρw ].

Sb is transformed to Qb = G− 1

2w ΦT

w Sb Φw G− 1

2w .

Qb is real symmetric matrix of size ρw × ρw having rank Ng − 1decomposable as Qb = Ψb Hb ΨT

b . Let Ψb have columns theNg − 1 eigenvectors associated to the non-zero eigenvalues of Qb.

The Ng − 1 discriminative vectors in the principal subspace of Sw

are the columns of U∗F = Φw G− 1

2w Ψb.

Music genre classification with multilinear and sparse techniques 32/79

Page 74: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Dual LDA: Principal subspace of Sw

Sw is a real symmetric matrix decomposable as ΦwGwΦTw . It has

rank ρw ≤ min(K ,Nst − Ng) ≤ K . It is assumed that Gw is thediagonal matrix confined to the ρw largest eigenvalues of Sw . Theprincipal subspace of Sw is defined by the eigenvectors, which areassociated to its ρw largest eigenvalues Φw = [φ1|φ2| · · · |φρw ].

Sb is transformed to Qb = G− 1

2w ΦT

w Sb Φw G− 1

2w .

Qb is real symmetric matrix of size ρw × ρw having rank Ng − 1decomposable as Qb = Ψb Hb ΨT

b . Let Ψb have columns theNg − 1 eigenvectors associated to the non-zero eigenvalues of Qb.

The Ng − 1 discriminative vectors in the principal subspace of Sw

are the columns of U∗F = Φw G− 1

2w Ψb.

Music genre classification with multilinear and sparse techniques 32/79

Page 75: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Dual LDA: Principal subspace of Sw

Sw is a real symmetric matrix decomposable as ΦwGwΦTw . It has

rank ρw ≤ min(K ,Nst − Ng) ≤ K . It is assumed that Gw is thediagonal matrix confined to the ρw largest eigenvalues of Sw . Theprincipal subspace of Sw is defined by the eigenvectors, which areassociated to its ρw largest eigenvalues Φw = [φ1|φ2| · · · |φρw ].

Sb is transformed to Qb = G− 1

2w ΦT

w Sb Φw G− 1

2w .

Qb is real symmetric matrix of size ρw × ρw having rank Ng − 1decomposable as Qb = Ψb Hb ΨT

b . Let Ψb have columns theNg − 1 eigenvectors associated to the non-zero eigenvalues of Qb.

The Ng − 1 discriminative vectors in the principal subspace of Sw

are the columns of U∗F = Φw G− 1

2w Ψb.

Music genre classification with multilinear and sparse techniques 32/79

Page 76: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Dual LDA: Principal subspace of Sw

Sw is a real symmetric matrix decomposable as ΦwGwΦTw . It has

rank ρw ≤ min(K ,Nst − Ng) ≤ K . It is assumed that Gw is thediagonal matrix confined to the ρw largest eigenvalues of Sw . Theprincipal subspace of Sw is defined by the eigenvectors, which areassociated to its ρw largest eigenvalues Φw = [φ1|φ2| · · · |φρw ].

Sb is transformed to Qb = G− 1

2w ΦT

w Sb Φw G− 1

2w .

Qb is real symmetric matrix of size ρw × ρw having rank Ng − 1decomposable as Qb = Ψb Hb ΨT

b . Let Ψb have columns theNg − 1 eigenvectors associated to the non-zero eigenvalues of Qb.

The Ng − 1 discriminative vectors in the principal subspace of Sw

are the columns of U∗F = Φw G− 1

2w Ψb.

Music genre classification with multilinear and sparse techniques 32/79

Page 77: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Dual LDA: Principal subspace of Sb

Sb is a real symmetric matrix decomposable as Sb = ΦbGbΦTb . It

has rank ρb ≤ min(K ,Ng − 1) = Ng − 1. Let Gb be the diagonalmatrix of the ρb non-zero eigenvalues of Sb. The principalsubspace of Sb is defined by the eigenvectors, which areassociated to the ρb non-zero eigenvalues of Sb, i.e.Ψb = [ψ1|ψ2| · · · |ψρb

].

Sw is transformed to Qw = G− 1

2b ΦT

b Sw Φb G− 1

2b .

Qw is real symmetric matrix of size ρb × ρb decomposable asQw = Ψw Hw ΨT

w .The Ng − 1 discriminative vectors in the principal subspace of Sb

are the columns of U∗F

= Φb G− 1

2b Ψw H

− 12

w .

Music genre classification with multilinear and sparse techniques 33/79

Page 78: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Dual LDA: Principal subspace of Sb

Sb is a real symmetric matrix decomposable as Sb = ΦbGbΦTb . It

has rank ρb ≤ min(K ,Ng − 1) = Ng − 1. Let Gb be the diagonalmatrix of the ρb non-zero eigenvalues of Sb. The principalsubspace of Sb is defined by the eigenvectors, which areassociated to the ρb non-zero eigenvalues of Sb, i.e.Ψb = [ψ1|ψ2| · · · |ψρb

].

Sw is transformed to Qw = G− 1

2b ΦT

b Sw Φb G− 1

2b .

Qw is real symmetric matrix of size ρb × ρb decomposable asQw = Ψw Hw ΨT

w .The Ng − 1 discriminative vectors in the principal subspace of Sb

are the columns of U∗F

= Φb G− 1

2b Ψw H

− 12

w .

Music genre classification with multilinear and sparse techniques 33/79

Page 79: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Dual LDA: Principal subspace of Sb

Sb is a real symmetric matrix decomposable as Sb = ΦbGbΦTb . It

has rank ρb ≤ min(K ,Ng − 1) = Ng − 1. Let Gb be the diagonalmatrix of the ρb non-zero eigenvalues of Sb. The principalsubspace of Sb is defined by the eigenvectors, which areassociated to the ρb non-zero eigenvalues of Sb, i.e.Ψb = [ψ1|ψ2| · · · |ψρb

].

Sw is transformed to Qw = G− 1

2b ΦT

b Sw Φb G− 1

2b .

Qw is real symmetric matrix of size ρb × ρb decomposable asQw = Ψw Hw ΨT

w .The Ng − 1 discriminative vectors in the principal subspace of Sb

are the columns of U∗F

= Φb G− 1

2b Ψw H

− 12

w .

Music genre classification with multilinear and sparse techniques 33/79

Page 80: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Dual LDA: Principal subspace of Sb

Sb is a real symmetric matrix decomposable as Sb = ΦbGbΦTb . It

has rank ρb ≤ min(K ,Ng − 1) = Ng − 1. Let Gb be the diagonalmatrix of the ρb non-zero eigenvalues of Sb. The principalsubspace of Sb is defined by the eigenvectors, which areassociated to the ρb non-zero eigenvalues of Sb, i.e.Ψb = [ψ1|ψ2| · · · |ψρb

].

Sw is transformed to Qw = G− 1

2b ΦT

b Sw Φb G− 1

2b .

Qw is real symmetric matrix of size ρb × ρb decomposable asQw = Ψw Hw ΨT

w .The Ng − 1 discriminative vectors in the principal subspace of Sb

are the columns of U∗F

= Φb G− 1

2b Ψw H

− 12

w .

Music genre classification with multilinear and sparse techniques 33/79

Page 81: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Dual LDA ClassifierAt the test stage, the sparse coefficient vector of any test sampley and the class centers are projected to the discriminant vectors inthe two principal subspaces

D(y,mi) = ||[U∗F ]T(x− mi

)||2 + % ||[U∗F ]T

(x− mi

)||2

where % =tr[

U∗F [U∗F ]T]

tr[

U∗F

[U∗F]T] and tr[ ] stands for the trace of the matrix

enclosed in brackets.The test sample y is classified to genre i∗ = argmini D(y,mi).

Music genre classification with multilinear and sparse techniques 34/79

Page 82: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Dual LDA ClassifierAt the test stage, the sparse coefficient vector of any test sampley and the class centers are projected to the discriminant vectors inthe two principal subspaces

D(y,mi) = ||[U∗F ]T(x− mi

)||2 + % ||[U∗F ]T

(x− mi

)||2

where % =tr[

U∗F [U∗F ]T]

tr[

U∗F

[U∗F]T] and tr[ ] stands for the trace of the matrix

enclosed in brackets.The test sample y is classified to genre i∗ = argmini D(y,mi).

Music genre classification with multilinear and sparse techniques 34/79

Page 83: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Classifier EnsembleClassifier combination has been an active research topic inMachine Learning and Pattern Recognitiona

By exploiting the 10 folds the training dataset is split into bystratified 10 fold cross-validation, the overcomplete dictionary[D∗]τ and the projection matrices [U∗F ]τ and [U∗

F]τ in each training

dataset fold τ = 1,2, . . . ,10 are learned.For each test sample, a voting is performed between theclassification labels assigned to it by the aforementioned 10discriminant sparse projections.The test sample is classified to the class received the most votes.

aJ. J. Rodrıguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A new classifier ensemble method,” IEEE Trans. Pattern

Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619-1630, October 2006.

Music genre classification with multilinear and sparse techniques 35/79

Page 84: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Classifier EnsembleClassifier combination has been an active research topic inMachine Learning and Pattern Recognitiona

By exploiting the 10 folds the training dataset is split into bystratified 10 fold cross-validation, the overcomplete dictionary[D∗]τ and the projection matrices [U∗F ]τ and [U∗

F]τ in each training

dataset fold τ = 1,2, . . . ,10 are learned.For each test sample, a voting is performed between theclassification labels assigned to it by the aforementioned 10discriminant sparse projections.The test sample is classified to the class received the most votes.

aJ. J. Rodrıguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A new classifier ensemble method,” IEEE Trans. Pattern

Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619-1630, October 2006.

Music genre classification with multilinear and sparse techniques 35/79

Page 85: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Classifier EnsembleClassifier combination has been an active research topic inMachine Learning and Pattern Recognitiona

By exploiting the 10 folds the training dataset is split into bystratified 10 fold cross-validation, the overcomplete dictionary[D∗]τ and the projection matrices [U∗F ]τ and [U∗

F]τ in each training

dataset fold τ = 1,2, . . . ,10 are learned.For each test sample, a voting is performed between theclassification labels assigned to it by the aforementioned 10discriminant sparse projections.The test sample is classified to the class received the most votes.

aJ. J. Rodrıguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A new classifier ensemble method,” IEEE Trans. Pattern

Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619-1630, October 2006.

Music genre classification with multilinear and sparse techniques 35/79

Page 86: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Dual Linear Discriminant Analysis of SparseRepresentations

Classifier EnsembleClassifier combination has been an active research topic inMachine Learning and Pattern Recognitiona

By exploiting the 10 folds the training dataset is split into bystratified 10 fold cross-validation, the overcomplete dictionary[D∗]τ and the projection matrices [U∗F ]τ and [U∗

F]τ in each training

dataset fold τ = 1,2, . . . ,10 are learned.For each test sample, a voting is performed between theclassification labels assigned to it by the aforementioned 10discriminant sparse projections.The test sample is classified to the class received the most votes.

aJ. J. Rodrıguez, L. I. Kuncheva, and C. J. Alonso, “Rotation forest: A new classifier ensemble method,” IEEE Trans. Pattern

Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1619-1630, October 2006.

Music genre classification with multilinear and sparse techniques 35/79

Page 87: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

1 Introduction

2 Auditory Spectro-temporal Modulations

3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment

4 Sparse Representation-based Classification (SRC)Experimental Assessment

5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment

6 OutlookComparison with the State of the ArtConclusions-Future Work

Music genre classification with multilinear and sparse techniques 36/79

Page 88: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

GTZAN dataset1000 audio recordings 30 seconds longa;10 genre classes: Blues, Classical, Country, Disco, HipHop, Jazz,Metal, Pop, Reggae, and Rock;Each genre class contains 100 audio recordings.The recordings are converted to monaural wave format at 16 kHzsampling rate with 16 bits and normalized, so that they have zeromean amplitude with unit variance.

aG. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech and Audio Processing, vol.

10, no. 5, pp. 293-302, July 2002.

Music genre classification with multilinear and sparse techniques 37/79

Page 89: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

GTZAN dataset1000 audio recordings 30 seconds longa;10 genre classes: Blues, Classical, Country, Disco, HipHop, Jazz,Metal, Pop, Reggae, and Rock;Each genre class contains 100 audio recordings.The recordings are converted to monaural wave format at 16 kHzsampling rate with 16 bits and normalized, so that they have zeromean amplitude with unit variance.

aG. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech and Audio Processing, vol.

10, no. 5, pp. 293-302, July 2002.

Music genre classification with multilinear and sparse techniques 37/79

Page 90: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

GTZAN dataset1000 audio recordings 30 seconds longa;10 genre classes: Blues, Classical, Country, Disco, HipHop, Jazz,Metal, Pop, Reggae, and Rock;Each genre class contains 100 audio recordings.The recordings are converted to monaural wave format at 16 kHzsampling rate with 16 bits and normalized, so that they have zeromean amplitude with unit variance.

aG. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech and Audio Processing, vol.

10, no. 5, pp. 293-302, July 2002.

Music genre classification with multilinear and sparse techniques 37/79

Page 91: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

GTZAN dataset1000 audio recordings 30 seconds longa;10 genre classes: Blues, Classical, Country, Disco, HipHop, Jazz,Metal, Pop, Reggae, and Rock;Each genre class contains 100 audio recordings.The recordings are converted to monaural wave format at 16 kHzsampling rate with 16 bits and normalized, so that they have zeromean amplitude with unit variance.

aG. Tzanetakis and P. Cook, “Musical genre classification of audio signals,” IEEE Trans. Speech and Audio Processing, vol.

10, no. 5, pp. 293-302, July 2002.

Music genre classification with multilinear and sparse techniques 37/79

Page 92: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Feature ExtractionThe auditory temporal modulations representation is computedover a segment of 30 sec duration, vectorized, and normalized tounit length.Each fold delivers a raw training pattern matrix of size 768× 900and a raw test pattern matrix of size 768× 100, which undergodownsampling with ratios 1/8, 1/4, 1/3, 1/2 in the rate-frequencydomain. Downsampled training pattern matrix Y ∈ RM×Nst ,M ∈ 12,48,85,192 and Nst = 900.

Multidimensional scaling (MDS)MDS with locality preserving indexinga

aD. Cai, X. He, and J. Han, “Document clustering using locality preserving indexing,” IEEE Trans. Knowledge and Data

Engineering, vol. 17, no. 12, pp. 1624-1637, December 2005.

Music genre classification with multilinear and sparse techniques 38/79

Page 93: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Feature ExtractionThe auditory temporal modulations representation is computedover a segment of 30 sec duration, vectorized, and normalized tounit length.Each fold delivers a raw training pattern matrix of size 768× 900and a raw test pattern matrix of size 768× 100, which undergodownsampling with ratios 1/8, 1/4, 1/3, 1/2 in the rate-frequencydomain. Downsampled training pattern matrix Y ∈ RM×Nst ,M ∈ 12,48,85,192 and Nst = 900.

Multidimensional scaling (MDS)MDS with locality preserving indexinga

aD. Cai, X. He, and J. Han, “Document clustering using locality preserving indexing,” IEEE Trans. Knowledge and Data

Engineering, vol. 17, no. 12, pp. 1624-1637, December 2005.

Music genre classification with multilinear and sparse techniques 38/79

Page 94: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Feature ExtractionThe auditory temporal modulations representation is computedover a segment of 30 sec duration, vectorized, and normalized tounit length.Each fold delivers a raw training pattern matrix of size 768× 900and a raw test pattern matrix of size 768× 100, which undergodownsampling with ratios 1/8, 1/4, 1/3, 1/2 in the rate-frequencydomain. Downsampled training pattern matrix Y ∈ RM×Nst ,M ∈ 12,48,85,192 and Nst = 900.

Multidimensional scaling (MDS)MDS with locality preserving indexinga

aD. Cai, X. He, and J. Han, “Document clustering using locality preserving indexing,” IEEE Trans. Knowledge and Data

Engineering, vol. 17, no. 12, pp. 1624-1637, December 2005.

Music genre classification with multilinear and sparse techniques 38/79

Page 95: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Auditory temporal modulations for the 1st test fold for M = 192

−0.27 −0.26 −0.25 −0.24 −0.23 −0.22 −0.21 −0.2 −0.19 −0.18 −0.17−0.04

−0.03

−0.02

−0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

1st MDS coordinate

2nd

MD

S c

oord

inat

e

TestBluesClassicalCountryDiscoHiphopJazzMetalPopReggaeRock

Music genre classification with multilinear and sparse techniques 39/79

Page 96: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Sparse coefficients for K = 400 and L = 20

−0.1 −0.05 0 0.05 0.1 0.15 0.2 0.25−0.15

−0.1

−0.05

0

0.05

0.1

1st MDS coordinate

2nd

MD

S c

oord

inat

e

TestBluesClassicalCountryDiscoHiphopJazzMetalPopReggaeRock

Music genre classification with multilinear and sparse techniques 40/79

Page 97: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Statistics of the number of non-zero coefficients, when [D∗]τ ,τ = 1,2, . . . ,10 are employed in OMP

0 20 40 60 80 1000

2

4

6

8

10

12

Test sample index j

min|

xj| 0

,E|

xj| 0

,m

ax|

xj| 0

Music genre classification with multilinear and sparse techniques 41/79

Page 98: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Projections of the sparse coefficient vectors to the principalsubspaces of [Sw ]1 and [Sb]1

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1st MDS coordinate

2nd

MD

S c

oord

inat

e

TestBluesClassicalCountryDiscoHiphopJazzMetalPopReggaeRock

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1st MDS coordinate

2nd

MD

S c

oord

inat

e

TestBluesClassicalCountryDiscoHiphopJazzMetalPopReggaeRock

Music genre classification with multilinear and sparse techniques 42/79

Page 99: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Classifier ensemble decisions

10 20 30 40 50 60 70 80 90 100

Blues

Classical

Country

Disco

Hiphop

Jazz

Metal

Pop

Reggae

Rock

Test sample index j

Cla

ssifi

er e

nsem

ble

deci

sion

BluesClassicalCountryDiscoHiphopJazzMetalPopReggaeRock

Music genre classification with multilinear and sparse techniques 43/79

Page 100: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Average classification accuracy (in %)Classifier Ratio

1/8 1/4 1/3 1/2dual LDA 34.2 42.3 49.3 54.3ensemble dual LDA 34.8 44.1 52.4 57.5DKL 34.4 42.3 49.3 54.4ensemble dual DKL 34.9 44.1 52.4 57.5discriminant sparse projections 42.2 43.03 55.03 57.64ensemble discriminative sparse projection 44.64 59.9 75.33 84.96

Music genre classification with multilinear and sparse techniques 44/79

Page 101: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

and its 95% confidence interval (in %)Classifier Ratio

1/8 1/4 1/3 1/2dual LDA/DKL 2.95 3.07 3.11 3.10ensemble dual LDA/DKL 2.96 3.09 3.11 3.07discriminant sparse projections 3.07 3.08 3.10 3.07ensemble discriminant sparse projections 3.09 3.05 2.68 2.22

Music genre classification with multilinear and sparse techniques 45/79

Page 102: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Cumulative Confusion Matrix (in %)Genre Blues Classical Country Disco Hiphop Jazz Metal Pop Reggae RockBlues 91 1 2 0 5 0 0 0 1 0

Classical 0 96 0 1 0 0 1 0 0 2Country 2 0 88 1 0 0 0 1 3 5Disco 0 0 2 89 2 0 0 4 0 3

Hiphop 0 0 0 9 78 0 3 9 0 1Jazz 3 0 2 1 0 92 0 0 0 2Metal 0 3 3 0 0 0 88 0 0 6Pop 0 0 1 6 2 0 0 86 1 4

Reggae 4 1 0 5 11 1 0 2 70 6Rock 6 0 1 5 3 2 8 3 2 70

Music genre classification with multilinear and sparse techniques 46/79

Page 103: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Main Idea (1)

Let us denote by Ai = [ai,1|ai,2| . . . |ai,ni ] ∈ R768×ni+ the

(sub)dictionary that has as columns the ni auditory modulationrepresentations stemming from the i th genre (i.e., atoms).Given a test auditory representation y ∈ R768

+ , that belongs to thei th genre, it can be expressed as y = Ai ci , whereci = [ci,1, ci,2, . . . , ci,ni ]

T ∈ Rni .

Music genre classification with multilinear and sparse techniques 47/79

Page 104: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Main Idea (1)

Let us denote by Ai = [ai,1|ai,2| . . . |ai,ni ] ∈ R768×ni+ the

(sub)dictionary that has as columns the ni auditory modulationrepresentations stemming from the i th genre (i.e., atoms).Given a test auditory representation y ∈ R768

+ , that belongs to thei th genre, it can be expressed as y = Ai ci , whereci = [ci,1, ci,2, . . . , ci,ni ]

T ∈ Rni .

Music genre classification with multilinear and sparse techniques 47/79

Page 105: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Main Idea (1)

Let us denote by Ai = [ai,1|ai,2| . . . |ai,ni ] ∈ R768×ni+ the

(sub)dictionary that has as columns the ni auditory modulationrepresentations stemming from the i th genre (i.e., atoms).Given a test auditory representation y ∈ R768

+ , that belongs to thei th genre, it can be expressed as y = Ai ci , whereci = [ci,1, ci,2, . . . , ci,ni ]

T ∈ Rni .

Music genre classification with multilinear and sparse techniques 47/79

Page 106: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Main Idea (2)

Let D = [A1|A2| . . . |AN ] ∈ R768×n+ be formed by concatenating the

n auditory modulation representations distributed across Ngenres.The test auditory representation y can be equivalently rewritten asy = D c, where c = [0T | . . . |0T |cT

i |0T | . . . |0T ]T .c contains information about the genre the test auditoryrepresentation y belongs to.We can find such a c by seeking the sparsest solution to the linearsystem of equations y = D c.

Music genre classification with multilinear and sparse techniques 48/79

Page 107: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Main Idea (2)

Let D = [A1|A2| . . . |AN ] ∈ R768×n+ be formed by concatenating the

n auditory modulation representations distributed across Ngenres.The test auditory representation y can be equivalently rewritten asy = D c, where c = [0T | . . . |0T |cT

i |0T | . . . |0T ]T .c contains information about the genre the test auditoryrepresentation y belongs to.We can find such a c by seeking the sparsest solution to the linearsystem of equations y = D c.

Music genre classification with multilinear and sparse techniques 48/79

Page 108: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Main Idea (2)

Let D = [A1|A2| . . . |AN ] ∈ R768×n+ be formed by concatenating the

n auditory modulation representations distributed across Ngenres.The test auditory representation y can be equivalently rewritten asy = D c, where c = [0T | . . . |0T |cT

i |0T | . . . |0T ]T .c contains information about the genre the test auditoryrepresentation y belongs to.We can find such a c by seeking the sparsest solution to the linearsystem of equations y = D c.

Music genre classification with multilinear and sparse techniques 48/79

Page 109: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Main Idea (2)

Let D = [A1|A2| . . . |AN ] ∈ R768×n+ be formed by concatenating the

n auditory modulation representations distributed across Ngenres.The test auditory representation y can be equivalently rewritten asy = D c, where c = [0T | . . . |0T |cT

i |0T | . . . |0T ]T .c contains information about the genre the test auditoryrepresentation y belongs to.We can find such a c by seeking the sparsest solution to the linearsystem of equations y = D c.

Music genre classification with multilinear and sparse techniques 48/79

Page 110: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Problem formulationGiven D and y, solve for

c∗ = argminc||c||0 subject to D c = y.

The aforementioned problem is NP-hard due to the nature of theunderlying combinational optimization.An approximate solution can be obtained by replacing the `0 normwith the `1 norm: c∗ = argminc ||c||1 subject to D c = y.

Music genre classification with multilinear and sparse techniques 49/79

Page 111: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Problem formulationGiven D and y, solve for

c∗ = argminc||c||0 subject to D c = y.

The aforementioned problem is NP-hard due to the nature of theunderlying combinational optimization.An approximate solution can be obtained by replacing the `0 normwith the `1 norm: c∗ = argminc ||c||1 subject to D c = y.

Music genre classification with multilinear and sparse techniques 49/79

Page 112: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Problem formulationGiven D and y, solve for

c∗ = argminc||c||0 subject to D c = y.

The aforementioned problem is NP-hard due to the nature of theunderlying combinational optimization.An approximate solution can be obtained by replacing the `0 normwith the `1 norm: c∗ = argminc ||c||1 subject to D c = y.

Music genre classification with multilinear and sparse techniques 49/79

Page 113: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Problem formulationGiven D and y, solve for

c∗ = argminc||c||0 subject to D c = y.

The aforementioned problem is NP-hard due to the nature of theunderlying combinational optimization.An approximate solution can be obtained by replacing the `0 normwith the `1 norm: c∗ = argminc ||c||1 subject to D c = y.

Music genre classification with multilinear and sparse techniques 49/79

Page 114: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Dimensionality ReductionFor overcomplete dictionaries derived from the auditory temporalmodulation representations, the dimensionality of atoms must be min(768,n).Thus, we reformulate the optimization problem under study as:

c∗ = argminc||c||1 subject to WT D c = WT y

where W ∈ R768×k with k << min(768,n) is a projection matrix.W is obtained by

non-negative matrix factorization (NMF);principal component analysis (PCA);independently sampling from a zero-mean normal distribution,andnormalizing each column to unit length.

Downsampling the entries of D is another option.

Music genre classification with multilinear and sparse techniques 50/79

Page 115: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Dimensionality ReductionFor overcomplete dictionaries derived from the auditory temporalmodulation representations, the dimensionality of atoms must be min(768,n).Thus, we reformulate the optimization problem under study as:

c∗ = argminc||c||1 subject to WT D c = WT y

where W ∈ R768×k with k << min(768,n) is a projection matrix.W is obtained by

non-negative matrix factorization (NMF);principal component analysis (PCA);independently sampling from a zero-mean normal distribution,andnormalizing each column to unit length.

Downsampling the entries of D is another option.

Music genre classification with multilinear and sparse techniques 50/79

Page 116: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Dimensionality ReductionFor overcomplete dictionaries derived from the auditory temporalmodulation representations, the dimensionality of atoms must be min(768,n).Thus, we reformulate the optimization problem under study as:

c∗ = argminc||c||1 subject to WT D c = WT y

where W ∈ R768×k with k << min(768,n) is a projection matrix.W is obtained by

non-negative matrix factorization (NMF);principal component analysis (PCA);independently sampling from a zero-mean normal distribution,andnormalizing each column to unit length.

Downsampling the entries of D is another option.

Music genre classification with multilinear and sparse techniques 50/79

Page 117: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Dimensionality ReductionFor overcomplete dictionaries derived from the auditory temporalmodulation representations, the dimensionality of atoms must be min(768,n).Thus, we reformulate the optimization problem under study as:

c∗ = argminc||c||1 subject to WT D c = WT y

where W ∈ R768×k with k << min(768,n) is a projection matrix.W is obtained by

non-negative matrix factorization (NMF);principal component analysis (PCA);independently sampling from a zero-mean normal distribution,andnormalizing each column to unit length.

Downsampling the entries of D is another option.

Music genre classification with multilinear and sparse techniques 50/79

Page 118: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Dimensionality ReductionFor overcomplete dictionaries derived from the auditory temporalmodulation representations, the dimensionality of atoms must be min(768,n).Thus, we reformulate the optimization problem under study as:

c∗ = argminc||c||1 subject to WT D c = WT y

where W ∈ R768×k with k << min(768,n) is a projection matrix.W is obtained by

non-negative matrix factorization (NMF);principal component analysis (PCA);independently sampling from a zero-mean normal distribution,andnormalizing each column to unit length.

Downsampling the entries of D is another option.

Music genre classification with multilinear and sparse techniques 50/79

Page 119: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Dimensionality ReductionFor overcomplete dictionaries derived from the auditory temporalmodulation representations, the dimensionality of atoms must be min(768,n).Thus, we reformulate the optimization problem under study as:

c∗ = argminc||c||1 subject to WT D c = WT y

where W ∈ R768×k with k << min(768,n) is a projection matrix.W is obtained by

non-negative matrix factorization (NMF);principal component analysis (PCA);independently sampling from a zero-mean normal distribution,andnormalizing each column to unit length.

Downsampling the entries of D is another option.

Music genre classification with multilinear and sparse techniques 50/79

Page 120: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Dimensionality ReductionFor overcomplete dictionaries derived from the auditory temporalmodulation representations, the dimensionality of atoms must be min(768,n).Thus, we reformulate the optimization problem under study as:

c∗ = argminc||c||1 subject to WT D c = WT y

where W ∈ R768×k with k << min(768,n) is a projection matrix.W is obtained by

non-negative matrix factorization (NMF);principal component analysis (PCA);independently sampling from a zero-mean normal distribution,andnormalizing each column to unit length.

Downsampling the entries of D is another option.

Music genre classification with multilinear and sparse techniques 50/79

Page 121: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Reasons for Dimensionality ReductionThe computational cost of linear programming solvers is reduced.The creation of a redundant dictionary from the training auditorytemporal modulation representations is facilitated.

Why a Redundant Dictionary?Enables treatment of

missing dataoutliersnoise.

Music genre classification with multilinear and sparse techniques 51/79

Page 122: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Reasons for Dimensionality ReductionThe computational cost of linear programming solvers is reduced.The creation of a redundant dictionary from the training auditorytemporal modulation representations is facilitated.

Why a Redundant Dictionary?Enables treatment of

missing dataoutliersnoise.

Music genre classification with multilinear and sparse techniques 51/79

Page 123: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Reasons for Dimensionality ReductionThe computational cost of linear programming solvers is reduced.The creation of a redundant dictionary from the training auditorytemporal modulation representations is facilitated.

Why a Redundant Dictionary?Enables treatment of

missing dataoutliersnoise.

Music genre classification with multilinear and sparse techniques 51/79

Page 124: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Reasons for Dimensionality ReductionThe computational cost of linear programming solvers is reduced.The creation of a redundant dictionary from the training auditorytemporal modulation representations is facilitated.

Why a Redundant Dictionary?Enables treatment of

missing dataoutliersnoise.

Music genre classification with multilinear and sparse techniques 51/79

Page 125: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Reasons for Dimensionality ReductionThe computational cost of linear programming solvers is reduced.The creation of a redundant dictionary from the training auditorytemporal modulation representations is facilitated.

Why a Redundant Dictionary?Enables treatment of

missing dataoutliersnoise.

Music genre classification with multilinear and sparse techniques 51/79

Page 126: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

ClassificationA test auditory modulation is classified as follows.

1 y is projected onto the reduced dimensions space: y = WT y.2 c∗ = argminc ||c||1 subject to WT D c = y.3 Ideally, c∗ contains non-zero entries associated with the columns

of WT D stemming from a single genre.4 Due to modeling errors, there are small non-zero entries in c∗, that

are associated to multiple genres.5 Each auditory modulations representation is classified to the

genre that minimizes the `2 norm residual between y andy = WT D ϑi(c), where ϑi(c) ∈ Rn is a new vector whose nonzeroentries are those in c associated to the i th genre.

Music genre classification with multilinear and sparse techniques 52/79

Page 127: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

ClassificationA test auditory modulation is classified as follows.

1 y is projected onto the reduced dimensions space: y = WT y.2 c∗ = argminc ||c||1 subject to WT D c = y.3 Ideally, c∗ contains non-zero entries associated with the columns

of WT D stemming from a single genre.4 Due to modeling errors, there are small non-zero entries in c∗, that

are associated to multiple genres.5 Each auditory modulations representation is classified to the

genre that minimizes the `2 norm residual between y andy = WT D ϑi(c), where ϑi(c) ∈ Rn is a new vector whose nonzeroentries are those in c associated to the i th genre.

Music genre classification with multilinear and sparse techniques 52/79

Page 128: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

ClassificationA test auditory modulation is classified as follows.

1 y is projected onto the reduced dimensions space: y = WT y.2 c∗ = argminc ||c||1 subject to WT D c = y.3 Ideally, c∗ contains non-zero entries associated with the columns

of WT D stemming from a single genre.4 Due to modeling errors, there are small non-zero entries in c∗, that

are associated to multiple genres.5 Each auditory modulations representation is classified to the

genre that minimizes the `2 norm residual between y andy = WT D ϑi(c), where ϑi(c) ∈ Rn is a new vector whose nonzeroentries are those in c associated to the i th genre.

Music genre classification with multilinear and sparse techniques 52/79

Page 129: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

ClassificationA test auditory modulation is classified as follows.

1 y is projected onto the reduced dimensions space: y = WT y.2 c∗ = argminc ||c||1 subject to WT D c = y.3 Ideally, c∗ contains non-zero entries associated with the columns

of WT D stemming from a single genre.4 Due to modeling errors, there are small non-zero entries in c∗, that

are associated to multiple genres.5 Each auditory modulations representation is classified to the

genre that minimizes the `2 norm residual between y andy = WT D ϑi(c), where ϑi(c) ∈ Rn is a new vector whose nonzeroentries are those in c associated to the i th genre.

Music genre classification with multilinear and sparse techniques 52/79

Page 130: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

ClassificationA test auditory modulation is classified as follows.

1 y is projected onto the reduced dimensions space: y = WT y.2 c∗ = argminc ||c||1 subject to WT D c = y.3 Ideally, c∗ contains non-zero entries associated with the columns

of WT D stemming from a single genre.4 Due to modeling errors, there are small non-zero entries in c∗, that

are associated to multiple genres.5 Each auditory modulations representation is classified to the

genre that minimizes the `2 norm residual between y andy = WT D ϑi(c), where ϑi(c) ∈ Rn is a new vector whose nonzeroentries are those in c associated to the i th genre.

Music genre classification with multilinear and sparse techniques 52/79

Page 131: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Sparse Representation-based Classification (SRC)

Sparse coefficients and residuals for a test auditory temporalmodulations representation of blues genre

Music genre classification with multilinear and sparse techniques 53/79

Page 132: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

1 Introduction

2 Auditory Spectro-temporal Modulations

3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment

4 Sparse Representation-based Classification (SRC)Experimental Assessment

5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment

6 OutlookComparison with the State of the ArtConclusions-Future Work

Music genre classification with multilinear and sparse techniques 54/79

Page 133: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Datasets1 GTZAN dataset2 ISMIR 2004 Genre Dataset

1458 full audio recordings;6 genre classes: Classical (640), Electronic (229), Jazz Blues(52),MetalPunk(90), RockPop(203), World (244).

ProtocolGTZAN dataset: stratified 10-fold cross-validation: Each trainingset consists of 900 audio recordings yielding a training matrixAGTZAN ∈ R768×900

+ .ISMIR 2004 Genre dataset: The ISMIR2004 Audio DescriptionContest protocol defines training and evaluation sets, whichconsist of 729 audio files each.

Music genre classification with multilinear and sparse techniques 55/79

Page 134: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Datasets1 GTZAN dataset2 ISMIR 2004 Genre Dataset

1458 full audio recordings;6 genre classes: Classical (640), Electronic (229), Jazz Blues(52),MetalPunk(90), RockPop(203), World (244).

ProtocolGTZAN dataset: stratified 10-fold cross-validation: Each trainingset consists of 900 audio recordings yielding a training matrixAGTZAN ∈ R768×900

+ .ISMIR 2004 Genre dataset: The ISMIR2004 Audio DescriptionContest protocol defines training and evaluation sets, whichconsist of 729 audio files each.

Music genre classification with multilinear and sparse techniques 55/79

Page 135: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Datasets1 GTZAN dataset2 ISMIR 2004 Genre Dataset

1458 full audio recordings;6 genre classes: Classical (640), Electronic (229), Jazz Blues(52),MetalPunk(90), RockPop(203), World (244).

ProtocolGTZAN dataset: stratified 10-fold cross-validation: Each trainingset consists of 900 audio recordings yielding a training matrixAGTZAN ∈ R768×900

+ .ISMIR 2004 Genre dataset: The ISMIR2004 Audio DescriptionContest protocol defines training and evaluation sets, whichconsist of 729 audio files each.

Music genre classification with multilinear and sparse techniques 55/79

Page 136: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Datasets1 GTZAN dataset2 ISMIR 2004 Genre Dataset

1458 full audio recordings;6 genre classes: Classical (640), Electronic (229), Jazz Blues(52),MetalPunk(90), RockPop(203), World (244).

ProtocolGTZAN dataset: stratified 10-fold cross-validation: Each trainingset consists of 900 audio recordings yielding a training matrixAGTZAN ∈ R768×900

+ .ISMIR 2004 Genre dataset: The ISMIR2004 Audio DescriptionContest protocol defines training and evaluation sets, whichconsist of 729 audio files each.

Music genre classification with multilinear and sparse techniques 55/79

Page 137: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

ParametersW ∈ R768×k is derived from AGTZAN and AISMIR by employing

NMF or PCA with k ∈ 12,48,85,192;random projection matrix for the same k .

Downsampling the auditory temporal modulations with ratios 1/8,1/4, 1/3, and 1/2.

ClassifiersSRClinear SVMsNearest Neighbor (NN) with cosine similarity measure (CSM)

Music genre classification with multilinear and sparse techniques 56/79

Page 138: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

ParametersW ∈ R768×k is derived from AGTZAN and AISMIR by employing

NMF or PCA with k ∈ 12,48,85,192;random projection matrix for the same k .

Downsampling the auditory temporal modulations with ratios 1/8,1/4, 1/3, and 1/2.

ClassifiersSRClinear SVMsNearest Neighbor (NN) with cosine similarity measure (CSM)

Music genre classification with multilinear and sparse techniques 56/79

Page 139: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

ParametersW ∈ R768×k is derived from AGTZAN and AISMIR by employing

NMF or PCA with k ∈ 12,48,85,192;random projection matrix for the same k .

Downsampling the auditory temporal modulations with ratios 1/8,1/4, 1/3, and 1/2.

ClassifiersSRClinear SVMsNearest Neighbor (NN) with cosine similarity measure (CSM)

Music genre classification with multilinear and sparse techniques 56/79

Page 140: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

ParametersW ∈ R768×k is derived from AGTZAN and AISMIR by employing

NMF or PCA with k ∈ 12,48,85,192;random projection matrix for the same k .

Downsampling the auditory temporal modulations with ratios 1/8,1/4, 1/3, and 1/2.

ClassifiersSRClinear SVMsNearest Neighbor (NN) with cosine similarity measure (CSM)

Music genre classification with multilinear and sparse techniques 56/79

Page 141: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

SRC accuracy on the GTZAN and ISMIR2004 datasets

0 50 100 150 20020

30

40

50

60

70

80

90

100

Feature Dimension

Cla

ssifi

catio

n A

ccur

acy

(%)

NMFPCARandomDownsample

0 50 100 150 20020

30

40

50

60

70

80

90

100

Feature Dimension

Cla

ssifi

catio

n A

ccur

acy

(%)

NMFPCARandomDownsample

Music genre classification with multilinear and sparse techniques 57/79

Page 142: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Linear LVM accuracy on the GTZAN and ISMIR2004 datasets

0 50 100 150 20020

30

40

50

60

70

80

90

100

Feature Dimension

Cla

ssifi

catio

n A

ccur

acy

(%)

NMFPCARandomDownsample

0 50 100 150 20020

30

40

50

60

70

80

90

100

Feature Dimension

Cla

ssifi

catio

n A

ccur

acy

(%)

NMFPCARandomDownsample

Music genre classification with multilinear and sparse techniques 58/79

Page 143: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

NN accuracy on the GTZAN and ISMIR2004 datasets

0 50 100 150 20020

30

40

50

60

70

80

90

100

Feature Dimension

Cla

ssifi

catio

n A

ccur

acy

(%)

NMFPCARandomDownsample

0 50 100 150 20020

30

40

50

60

70

80

90

100

Feature Dimension

Cla

ssifi

catio

n A

ccur

acy

(%)

NMFPCARandomDownsample

Music genre classification with multilinear and sparse techniques 59/79

Page 144: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

1 Introduction

2 Auditory Spectro-temporal Modulations

3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment

4 Sparse Representation-based Classification (SRC)Experimental Assessment

5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment

6 OutlookComparison with the State of the ArtConclusions-Future Work

Music genre classification with multilinear and sparse techniques 60/79

Page 145: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Locality Preserving Non-negative Tensor Factorization

Multilinear Dimensionality Reduction TechniquesUnsupervised ones: Non-Negative Tensor Factorization (NTF)a,Multilinear Principal Component Analysis (MPCA)b, Non-negativeMPCA (NMPCA)c;Supervised ones: General Tensor Discriminant Analysis (GTDA)d,Discriminant Non-Negative Tensor Factorization (DNTF)e.

aE. Benetos and C. Kotropoulos, “A tensor-based approach for automatic music genre classification,” in Proc. XVI European

Signal Processing Conf., Lausanne, Switzerland, 2008.b

H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos: “MPCA: Multilinear principal component analysis of tensor objects,”IEEE Trans. Neural Networks, vol. 19, no. 1, pp 18-39, 2008.

cY. Panagakis, C. Kotropoulos, and G. R. Arce, “Non-negative multilinear principal component analysis of auditory temporal

modulations for music genre classification,” IEEE Trans. Audio, Speech, and Language Processing, to appear.d

D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminant analysis and Gabor features for gait recognition,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1700-1715, 2007.

eS. Zafeiriou, “Discriminant nonnegative tensor factorization algorithms,” IEEE Trans. Neural Networks, vol. 20, no. 2, pp.

217-235, 2009.

Music genre classification with multilinear and sparse techniques 61/79

Page 146: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Locality Preserving Non-negative Tensor Factorization

Multilinear Dimensionality Reduction TechniquesUnsupervised ones: Non-Negative Tensor Factorization (NTF)a,Multilinear Principal Component Analysis (MPCA)b, Non-negativeMPCA (NMPCA)c;Supervised ones: General Tensor Discriminant Analysis (GTDA)d,Discriminant Non-Negative Tensor Factorization (DNTF)e.

aE. Benetos and C. Kotropoulos, “A tensor-based approach for automatic music genre classification,” in Proc. XVI European

Signal Processing Conf., Lausanne, Switzerland, 2008.b

H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos: “MPCA: Multilinear principal component analysis of tensor objects,”IEEE Trans. Neural Networks, vol. 19, no. 1, pp 18-39, 2008.

cY. Panagakis, C. Kotropoulos, and G. R. Arce, “Non-negative multilinear principal component analysis of auditory temporal

modulations for music genre classification,” IEEE Trans. Audio, Speech, and Language Processing, to appear.d

D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminant analysis and Gabor features for gait recognition,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1700-1715, 2007.

eS. Zafeiriou, “Discriminant nonnegative tensor factorization algorithms,” IEEE Trans. Neural Networks, vol. 20, no. 2, pp.

217-235, 2009.

Music genre classification with multilinear and sparse techniques 61/79

Page 147: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Locality Preserving Non-negative Tensor Factorization

Multilinear Dimensionality Reduction TechniquesUnsupervised ones: Non-Negative Tensor Factorization (NTF)a,Multilinear Principal Component Analysis (MPCA)b, Non-negativeMPCA (NMPCA)c;Supervised ones: General Tensor Discriminant Analysis (GTDA)d,Discriminant Non-Negative Tensor Factorization (DNTF)e.

aE. Benetos and C. Kotropoulos, “A tensor-based approach for automatic music genre classification,” in Proc. XVI European

Signal Processing Conf., Lausanne, Switzerland, 2008.b

H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos: “MPCA: Multilinear principal component analysis of tensor objects,”IEEE Trans. Neural Networks, vol. 19, no. 1, pp 18-39, 2008.

cY. Panagakis, C. Kotropoulos, and G. R. Arce, “Non-negative multilinear principal component analysis of auditory temporal

modulations for music genre classification,” IEEE Trans. Audio, Speech, and Language Processing, to appear.d

D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminant analysis and Gabor features for gait recognition,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1700-1715, 2007.

eS. Zafeiriou, “Discriminant nonnegative tensor factorization algorithms,” IEEE Trans. Neural Networks, vol. 20, no. 2, pp.

217-235, 2009.

Music genre classification with multilinear and sparse techniques 61/79

Page 148: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Locality Preserving Non-negative Tensor Factorization

Local structure in a nonlinear manifoldLet Aq|Qq=1 be a set of Q non-negative tensors of order N, whichlie in a nonlinear manifold A embedded into the tensor space. Theset can be represented as an (N + 1)-order tensor A ∈ RI1×I2×...

+×IN×IN+1 with IN+1 = Q.The local structure of A can be modeled by the nearest neighborgraph G whose weight matrix S has elements

sqp =

e−||Aq−Ap||2

τ if Aq and Ap belong to the same class0 otherwise

with || ||2 denoting the tensor norma.The Laplacian matrix is L = Γ− S, where Γ is a diagonal matrixwith elements γqq =

∑p sqp, i.e. the column sums of S.

aT. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 51, no. 3, to appear.

Music genre classification with multilinear and sparse techniques 62/79

Page 149: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Locality Preserving Non-negative Tensor Factorization

Local structure in a nonlinear manifoldLet Aq|Qq=1 be a set of Q non-negative tensors of order N, whichlie in a nonlinear manifold A embedded into the tensor space. Theset can be represented as an (N + 1)-order tensor A ∈ RI1×I2×...

+×IN×IN+1 with IN+1 = Q.The local structure of A can be modeled by the nearest neighborgraph G whose weight matrix S has elements

sqp =

e−||Aq−Ap||2

τ if Aq and Ap belong to the same class0 otherwise

with || ||2 denoting the tensor norma.The Laplacian matrix is L = Γ− S, where Γ is a diagonal matrixwith elements γqq =

∑p sqp, i.e. the column sums of S.

aT. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 51, no. 3, to appear.

Music genre classification with multilinear and sparse techniques 62/79

Page 150: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Locality Preserving Non-negative Tensor Factorization

Local structure in a nonlinear manifoldLet Aq|Qq=1 be a set of Q non-negative tensors of order N, whichlie in a nonlinear manifold A embedded into the tensor space. Theset can be represented as an (N + 1)-order tensor A ∈ RI1×I2×...

+×IN×IN+1 with IN+1 = Q.The local structure of A can be modeled by the nearest neighborgraph G whose weight matrix S has elements

sqp =

e−||Aq−Ap||2

τ if Aq and Ap belong to the same class0 otherwise

with || ||2 denoting the tensor norma.The Laplacian matrix is L = Γ− S, where Γ is a diagonal matrixwith elements γqq =

∑p sqp, i.e. the column sums of S.

aT. Kolda and B. W. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 51, no. 3, to appear.

Music genre classification with multilinear and sparse techniques 62/79

Page 151: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Locality Preserving Non-negative Tensor Factorization

Problem statementLet Z(i) = U(N+1) . . . U(i+1) U(i−1) . . . U(1).Subject to U(i) ≥ 0, i = 1,2, . . . ,N + 1, minimize

fLPNTF(U(i)|N+1

i=1

)= ||A(i)−U(i)[Z(i)]T ||2 +λtr

[U(N+1)

]T L U(N+1)

where U(i) ∈ RIi×k+ , k is the desirable number of rank-1 tensors

approximating A when linearly combined, and λ > 0.

Music genre classification with multilinear and sparse techniques 63/79

Page 152: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Locality Preserving Non-negative Tensor Factorization

Problem statementLet Z(i) = U(N+1) . . . U(i+1) U(i−1) . . . U(1).Subject to U(i) ≥ 0, i = 1,2, . . . ,N + 1, minimize

fLPNTF(U(i)|N+1

i=1

)= ||A(i)−U(i)[Z(i)]T ||2 +λtr

[U(N+1)

]T L U(N+1)

where U(i) ∈ RIi×k+ , k is the desirable number of rank-1 tensors

approximating A when linearly combined, and λ > 0.

Music genre classification with multilinear and sparse techniques 63/79

Page 153: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Locality Preserving Non-negative Tensor Factorization

Gradients

∇U(i) fLPNTF =

U(i)[Z(i)]T Z(i)︸ ︷︷ ︸∇+

U(i) fLPNTF

− A(i)Z(i)︸ ︷︷ ︸∇−

U(i) fLPNTF

for i = 1,2, . . . ,N

U(N+1)[Z(N+1)

]T Z(N+1) + λ ΓU(N+1)︸ ︷︷ ︸∇+

U(N+1)fLPNTF

−(A(N+1)Z(N+1) + λ S U(N+1)

)︸ ︷︷ ︸∇−

U(N+1)fLPNTF

for i = N + 1.

Music genre classification with multilinear and sparse techniques 64/79

Page 154: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Locality Preserving Non-negative Tensor Factorization

Robust multiplicative update rulesExtending (Lin, 2007)a, it is proven:

U(i)[t+1] = U(i)

[t] −U(i)

[t]

∇+

U(i)[t]

fLPNTF + δ∗ ∇U(i)

[t]fLPNTF

U(i)[t] =

U(i)

[t] if ∇U(i)[t]

fLPNTF ≥ 0

σ otherwise

for σ, δ small positive numbers, typically 10−8. The division iselementwise and t denotes the iteration index.

aC. -J. Lin, “On the convergence of multiplicative update algorithms for nonnegative matrix factorization,” IEEE Trans. Neural

Networks, vol. 18, no. 6, pp. 1589-1596, 2007.

Music genre classification with multilinear and sparse techniques 65/79

Page 155: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Locality Preserving Non-negative Tensor Factorization

Impact on the SRC

Data tensor A ∈ RI1×I2×I3×I4+ , where I1 = Iscales = 6,

I2 = Irates = 10, I3 = Ifrequencies = 128, and I4 = Isamples.SRC problem: c? = argminc ||c||1 subject to W D c = Wy, whereW ∈ Rk×7680 with k min(7680, Isamples) is a projection matrix.When LPNTF, NTF, or DNTF is applied to A, four factor matricesU(i) ∈ RIi×k

+ , i = 1,2,3,4, are obtained, which are associated toscale, rate, frequency, and sample modes, respectively.W = (U(3) U(2) U(1))T or W = (U(3) U(2) U(1))†.D = AT

(4) = WT [U(4)]T .

For MPCA or GTDA, three factor matrices U(i) ∈ RIi×Ji , withJi < Ii , i = 1,2,3, are obtained. W = (U(3) ⊗ U(2) ⊗ U(1))T orW = (U(3) ⊗ U(2) ⊗ U(1))†. The columns of D are obtained byapplying W to vectorized training tensors vec(Aq).

Music genre classification with multilinear and sparse techniques 66/79

Page 156: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Locality Preserving Non-negative Tensor Factorization

Impact on the SRC

Data tensor A ∈ RI1×I2×I3×I4+ , where I1 = Iscales = 6,

I2 = Irates = 10, I3 = Ifrequencies = 128, and I4 = Isamples.SRC problem: c? = argminc ||c||1 subject to W D c = Wy, whereW ∈ Rk×7680 with k min(7680, Isamples) is a projection matrix.When LPNTF, NTF, or DNTF is applied to A, four factor matricesU(i) ∈ RIi×k

+ , i = 1,2,3,4, are obtained, which are associated toscale, rate, frequency, and sample modes, respectively.W = (U(3) U(2) U(1))T or W = (U(3) U(2) U(1))†.D = AT

(4) = WT [U(4)]T .

For MPCA or GTDA, three factor matrices U(i) ∈ RIi×Ji , withJi < Ii , i = 1,2,3, are obtained. W = (U(3) ⊗ U(2) ⊗ U(1))T orW = (U(3) ⊗ U(2) ⊗ U(1))†. The columns of D are obtained byapplying W to vectorized training tensors vec(Aq).

Music genre classification with multilinear and sparse techniques 66/79

Page 157: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Locality Preserving Non-negative Tensor Factorization

Impact on the SRC

Data tensor A ∈ RI1×I2×I3×I4+ , where I1 = Iscales = 6,

I2 = Irates = 10, I3 = Ifrequencies = 128, and I4 = Isamples.SRC problem: c? = argminc ||c||1 subject to W D c = Wy, whereW ∈ Rk×7680 with k min(7680, Isamples) is a projection matrix.When LPNTF, NTF, or DNTF is applied to A, four factor matricesU(i) ∈ RIi×k

+ , i = 1,2,3,4, are obtained, which are associated toscale, rate, frequency, and sample modes, respectively.W = (U(3) U(2) U(1))T or W = (U(3) U(2) U(1))†.D = AT

(4) = WT [U(4)]T .

For MPCA or GTDA, three factor matrices U(i) ∈ RIi×Ji , withJi < Ii , i = 1,2,3, are obtained. W = (U(3) ⊗ U(2) ⊗ U(1))T orW = (U(3) ⊗ U(2) ⊗ U(1))†. The columns of D are obtained byapplying W to vectorized training tensors vec(Aq).

Music genre classification with multilinear and sparse techniques 66/79

Page 158: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Locality Preserving Non-negative Tensor Factorization

Impact on the SRC

Data tensor A ∈ RI1×I2×I3×I4+ , where I1 = Iscales = 6,

I2 = Irates = 10, I3 = Ifrequencies = 128, and I4 = Isamples.SRC problem: c? = argminc ||c||1 subject to W D c = Wy, whereW ∈ Rk×7680 with k min(7680, Isamples) is a projection matrix.When LPNTF, NTF, or DNTF is applied to A, four factor matricesU(i) ∈ RIi×k

+ , i = 1,2,3,4, are obtained, which are associated toscale, rate, frequency, and sample modes, respectively.W = (U(3) U(2) U(1))T or W = (U(3) U(2) U(1))†.D = AT

(4) = WT [U(4)]T .

For MPCA or GTDA, three factor matrices U(i) ∈ RIi×Ji , withJi < Ii , i = 1,2,3, are obtained. W = (U(3) ⊗ U(2) ⊗ U(1))T orW = (U(3) ⊗ U(2) ⊗ U(1))†. The columns of D are obtained byapplying W to vectorized training tensors vec(Aq).

Music genre classification with multilinear and sparse techniques 66/79

Page 159: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

1 Introduction

2 Auditory Spectro-temporal Modulations

3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment

4 Sparse Representation-based Classification (SRC)Experimental Assessment

5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment

6 OutlookComparison with the State of the ArtConclusions-Future Work

Music genre classification with multilinear and sparse techniques 67/79

Page 160: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Experimental setupThe training tensor is constructed by stacking the corticalrepresentations: AGTZAN ∈ R6×10×128×900

+ ;AISMIR ∈ R6×10×128×729

+ .λ = 0.5 and τ = 1 (heat kernel) are empirically set for LPNTF;Need for cross-validation.To determine the dimensionality of factor matrices, the ratio of thesum of eigenvalues retained over the sum of all eigenvalues foreach mode-n tensor unfolding is employed.The same J1 = Jscales, J2 = Jrates, and J3 = Jfrequencies are used inMPCA and GTDA; k = J1J2J3 for LPNTF, NTF, DNTF, and randomprojection.

Music genre classification with multilinear and sparse techniques 68/79

Page 161: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Experimental setupThe training tensor is constructed by stacking the corticalrepresentations: AGTZAN ∈ R6×10×128×900

+ ;AISMIR ∈ R6×10×128×729

+ .λ = 0.5 and τ = 1 (heat kernel) are empirically set for LPNTF;Need for cross-validation.To determine the dimensionality of factor matrices, the ratio of thesum of eigenvalues retained over the sum of all eigenvalues foreach mode-n tensor unfolding is employed.The same J1 = Jscales, J2 = Jrates, and J3 = Jfrequencies are used inMPCA and GTDA; k = J1J2J3 for LPNTF, NTF, DNTF, and randomprojection.

Music genre classification with multilinear and sparse techniques 68/79

Page 162: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Experimental setupThe training tensor is constructed by stacking the corticalrepresentations: AGTZAN ∈ R6×10×128×900

+ ;AISMIR ∈ R6×10×128×729

+ .λ = 0.5 and τ = 1 (heat kernel) are empirically set for LPNTF;Need for cross-validation.To determine the dimensionality of factor matrices, the ratio of thesum of eigenvalues retained over the sum of all eigenvalues foreach mode-n tensor unfolding is employed.The same J1 = Jscales, J2 = Jrates, and J3 = Jfrequencies are used inMPCA and GTDA; k = J1J2J3 for LPNTF, NTF, DNTF, and randomprojection.

Music genre classification with multilinear and sparse techniques 68/79

Page 163: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Experimental setupThe training tensor is constructed by stacking the corticalrepresentations: AGTZAN ∈ R6×10×128×900

+ ;AISMIR ∈ R6×10×128×729

+ .λ = 0.5 and τ = 1 (heat kernel) are empirically set for LPNTF;Need for cross-validation.To determine the dimensionality of factor matrices, the ratio of thesum of eigenvalues retained over the sum of all eigenvalues foreach mode-n tensor unfolding is employed.The same J1 = Jscales, J2 = Jrates, and J3 = Jfrequencies are used inMPCA and GTDA; k = J1J2J3 for LPNTF, NTF, DNTF, and randomprojection.

Music genre classification with multilinear and sparse techniques 68/79

Page 164: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Total number of retained principal components in each mode forthe GTZAN and ISMIR2004 datasets

78 80 82 84 86 88 90 92 942

3

4

5

6

7

8

9

10

11

12

Portion of total scatter retained (%)

Num

ber

of p

rinci

pal c

ompo

nent

s

Rate subspaceScale subspaceFrequency subspace

78 80 82 84 86 88 90 92 942

3

4

5

6

7

8

9

10

11

12

Portion of total scatter retained (%)

Num

ber

of p

rinci

pal c

ompo

nent

s

Rate subspaceScale subspaceFrequency subspace

Music genre classification with multilinear and sparse techniques 69/79

Page 165: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

Feature dimension for the GTZAN and ISMIR2004 datasets

78 80 82 84 86 88 90 92 9440

60

80

100

120

140

160

180

200

220

Portion of total scatter retained (%)

Fea

ture

dim

ensi

on

78 80 82 84 86 88 90 92 9420

40

60

80

100

120

140

160

180

200

220

Portion of total scatter retained (%)

Fea

ture

dim

ensi

on

Music genre classification with multilinear and sparse techniques 70/79

Page 166: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Assessment

SRC accuracy on the GTZAN and ISMIR2004 datasets

78 80 82 84 86 88 90 92 9430

40

50

60

70

80

90

100

Portion of total scatter retained (%)

Cla

ssifi

catio

n A

ccur

acy

(%)

LPNTFNTFDNTFMPCAGTDARandom

78 80 82 84 86 88 90 92 9430

40

50

60

70

80

90

100

Portion of total scatter retained (%)

Cla

ssifi

catio

n A

ccur

acy

(%)

LPNTFNTFDNTFMPCAGTDARandom

Music genre classification with multilinear and sparse techniques 71/79

Page 167: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Experimental Evaluation of SRC on CorticalRepresentations

Linear SVM accuracy on the GTZAN and ISMIR2004 datasets

78 80 82 84 86 88 90 92 9430

40

50

60

70

80

90

100

Portion of total scatter retained (%)

Cla

ssifi

catio

n A

ccur

acy

(%)

LPNTFNTFDNTFMPCAGTDARandom

78 80 82 84 86 88 90 92 9430

40

50

60

70

80

90

100

Portion of total scatter retained(%)

Cla

ssifi

catio

n A

ccur

acy

(%)

LPNTFNTFDNTFMPCAGTDARandom

Music genre classification with multilinear and sparse techniques 72/79

Page 168: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

1 Introduction

2 Auditory Spectro-temporal Modulations

3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment

4 Sparse Representation-based Classification (SRC)Experimental Assessment

5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment

6 OutlookComparison with the State of the ArtConclusions-Future Work

Music genre classification with multilinear and sparse techniques 73/79

Page 169: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Comparison with the State of the Art

GTZAN datasetMethod Accuracy (in %)Topology preserving NTF +SRC 93.7LNPTF+SRCa 92.4 ± 2NMF+SRCb 91 ± 1.76Ensemble discriminant sparse projections 84.96 ± 2.22NMPCA + SVM-RBFc 84.3Adaboostd 82.5Daubechies wavelet coefficient histograms + SVMe 78.5Daubechies wavelet coefficient histograms + LDA 71.3

aY. Panagakis, C. Kotropoulos, and G. R. Arce, “Music genre classification using locality preserving non-negative tensor

factorization and sparse representations, in Proc. 2009 Int. Conf. Music Information Retrieval, Kobe, Japan, October 2009.b

Y. Panagakis, C. Kotropoulos, and G. R. Arce, “Music genre classification via sparse representations of auditory temporalmodulations,” in Proc. 17th European Signal Processing Conf., Glasgow, August 2009.

cY. Panagakis, C. Kotropoulos, and G. R. Arce, “Non-negative multilinear principal component analysis of auditory temporal

modulations for music genre classification,” IEEE Trans. Audio, Speech, and Language Processing, to appear.d

J. Bergstra, N. Casagrande, D. Erhan, D. Eck, and B. Kegl, “Aggregate features and AdaBoost for music classification,”Machine Learning, vol. 65, no. 2-3, pp. 473-484, 2006.

eT. Li, M. Ogihara, and Q. Li, “A comparative study on content-based music genre classification,” in Proc. 26th Int. ACM

SIGIR Conf. Research and Development in Information Retrieval, Toronto, Canada, 2003, pp. 282-289.

Music genre classification with multilinear and sparse techniques 74/79

Page 170: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Comparison with the State of the Art

ISMIR2004 datasetMethod Accuracy (in %)Topology preserving NTF +SRC 94.93NTF+SRC (ISMIR2009) 94.38 ± 1.68LPNTF+SRC (ISMIR 2009) 94.25 ± 1.70PCA+SRC (EUSIPCO 2009) 93.56 ± 1.79NMF+GMMa 83.5NTF + SVM-RBF (IEEE TSLP 2009) 83.15Adaboost 82.3NMPCA + SVM-RBF (IEEE TSLP 2009) 82.19

aA. Holzapfel and Y. Stylianou, “Musical genre classification using nonnegative matrix factorization-based features,” IEEE

Trans. Audio, Speech, and Language Processing, vol. 16, no. 2, pp. 424-434, 2008.

Music genre classification with multilinear and sparse techniques 75/79

Page 171: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

1 Introduction

2 Auditory Spectro-temporal Modulations

3 Ensemble Discriminant Sparse ProjectionsOvercomplete Dictionaries for Sparse RepresentationsDual Linear Discriminant Analysis of Sparse RepresentationsExperimental Assessment

4 Sparse Representation-based Classification (SRC)Experimental Assessment

5 Locality Preserving Non-negative Tensor Factorization within SRCLocality Preserving Non-negative Tensor FactorizationExperimental Assessment

6 OutlookComparison with the State of the ArtConclusions-Future Work

Music genre classification with multilinear and sparse techniques 76/79

Page 172: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Conclusions-Future Work

SummaryA robust music genre classification framework has been proposedby taking into account the properties of the auditory humanperception.2D auditory temporal modulations and 3D cortical representationsyield rich tensors for feature extraction, while sparse conceptshave been employed for feature selection and classification.The best classification accuracies reported outperform any rateever obtained by the state of the art music genre classificationalgorithms when applied to either the GTZAN or the ISMIR2004Genre datasets.

Music genre classification with multilinear and sparse techniques 77/79

Page 173: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Conclusions-Future Work

SummaryA robust music genre classification framework has been proposedby taking into account the properties of the auditory humanperception.2D auditory temporal modulations and 3D cortical representationsyield rich tensors for feature extraction, while sparse conceptshave been employed for feature selection and classification.The best classification accuracies reported outperform any rateever obtained by the state of the art music genre classificationalgorithms when applied to either the GTZAN or the ISMIR2004Genre datasets.

Music genre classification with multilinear and sparse techniques 77/79

Page 174: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Conclusions-Future Work

SummaryA robust music genre classification framework has been proposedby taking into account the properties of the auditory humanperception.2D auditory temporal modulations and 3D cortical representationsyield rich tensors for feature extraction, while sparse conceptshave been employed for feature selection and classification.The best classification accuracies reported outperform any rateever obtained by the state of the art music genre classificationalgorithms when applied to either the GTZAN or the ISMIR2004Genre datasets.

Music genre classification with multilinear and sparse techniques 77/79

Page 175: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Conclusions-Future Work

Future WorkThe dependence of the SRC on the dimensionality reductiontechnique deserves further research.The design of discriminant overcomplete dictionaries within theclassifier ensemble and/or the substitution of the dual LDA bysparse LDA in order to enforce sparsity on the columns of W couldbe pursued.Efficient implementations using incremental update rules areneeded.In many commercial and private applications, the number ofavailable audio recordings per genre is limited. Thus, it isdesirable the music genre classification algorithm to perform wellin such small sample sets.

Music genre classification with multilinear and sparse techniques 78/79

Page 176: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Conclusions-Future Work

Future WorkThe dependence of the SRC on the dimensionality reductiontechnique deserves further research.The design of discriminant overcomplete dictionaries within theclassifier ensemble and/or the substitution of the dual LDA bysparse LDA in order to enforce sparsity on the columns of W couldbe pursued.Efficient implementations using incremental update rules areneeded.In many commercial and private applications, the number ofavailable audio recordings per genre is limited. Thus, it isdesirable the music genre classification algorithm to perform wellin such small sample sets.

Music genre classification with multilinear and sparse techniques 78/79

Page 177: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Conclusions-Future Work

Future WorkThe dependence of the SRC on the dimensionality reductiontechnique deserves further research.The design of discriminant overcomplete dictionaries within theclassifier ensemble and/or the substitution of the dual LDA bysparse LDA in order to enforce sparsity on the columns of W couldbe pursued.Efficient implementations using incremental update rules areneeded.In many commercial and private applications, the number ofavailable audio recordings per genre is limited. Thus, it isdesirable the music genre classification algorithm to perform wellin such small sample sets.

Music genre classification with multilinear and sparse techniques 78/79

Page 178: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Conclusions-Future Work

Future WorkThe dependence of the SRC on the dimensionality reductiontechnique deserves further research.The design of discriminant overcomplete dictionaries within theclassifier ensemble and/or the substitution of the dual LDA bysparse LDA in order to enforce sparsity on the columns of W couldbe pursued.Efficient implementations using incremental update rules areneeded.In many commercial and private applications, the number ofavailable audio recordings per genre is limited. Thus, it isdesirable the music genre classification algorithm to perform wellin such small sample sets.

Music genre classification with multilinear and sparse techniques 78/79

Page 179: Music genre classification with multilinear and sparse techniques … · 2009-10-19 · Introduction Music Genre The most popular description of music content despite the lack of

Thank You!Questions?

Music genre classification with multilinear and sparse techniques 79/79