project 5 (music genre recognition using pca)

Development of Music Genre Recognition Algorithm using

Windowed Fourier Transform and SVD/PCA techniques

Author

Jimin Kim

Abstract

When we listen to a song, the genre of the song is immediately recognizable to us.

This implies the fact that our brain is capable of processing the auditory information and

categorizes them into the genres we are familiar of. This paper attempts to develop a simple

algorithm that mimics the brain’s processing and categorization of a sound signal through

applying SVD/PCA technique. By constructing a training set for each musical category using

the spectrograms and PCA technique, one can build a statistical testing algorithm that is

capable of identifying the genre of a given music sample. The objective of this paper is to

introduce the development of such algorithm and successfully implement into MATLAB to

test its classification performance with realistic music samples from various genres.

Introduction/Overview

The paper will first introduce the theoretical development of the algorithm using

spectrograms and PCA technique. Followed by the theoretical background, the

implementation of algorithm into MATLAB will be introduced. Once the algorithm is fully

integrated into MATLAB, it will be applied to three different trials. The first trial will test

algorithm’s ability to classify the songs from three different bands from different genres. In

second trial, the same methodology will be used as the first trial except that three bands will

be from the same genre. Finally in the third trial, the algorithm will be tested with the general

classification of the music genres such as Jazz, Rock and Electronics.

Theoretical background

The theoretical framework behind the algorithm consists of two main concepts:

Windowed Fourier transform and Principal Component Analysis. When it comes to the

development of music genre recognition algorithm, the most important process is to pick up

the unique features that define the songs such as timbre, tempo and beats. One of the most

effective ways of capturing these features is by using the Windowed Fourier transform.

Mathematically, Windowed Fourier transform is Fourier transform with slight

modification. Recall that Fourier transform equation states

(1)

Where k is the frequency domain and x is the position (or time) domain. The

Windowed Fourier transform adds a time translation kernel

(2)

Into the Fourier transform, which then becomes

(3)

Here, the term induces the time localization of the Fourier integral around

. Therefore, as varies in the given time interval, g sweeps through the signal and

picks up the frequency information from each point in time, just as shown in top picture of

the Figure 1. By introducing the translational kernel, windowed Fourier transform enables

one to investigate both time and frequency information of a given signal with some trade off

from both domains. Therefore, when this technique is applied to the sound signal such as a

song, one can construct a spectrogram that holds both time and frequency information. This is

precisely what one should look for during the development of music genre recognition

algorithm. Because the spectrogram not only provides information about the song’s overall

frequency range, but also holds information about time dependent features such as tempo and

rhythm, it can serve as an excellent quantitative representation of a song of the interest.

Figure 2 shows an example of such spectrogram.

Figure 1. This figures describes how Windowed Fourier transform is performed. The top picture shows

the overlap between the translational kernel function (red) with the signal. The middle picture shows

the filtered signal at the given timestamp. The bottom picture shows the FFT transform of the filtered

signal.

However, there is one problem to overcome. Spectrograms do offer a good amount of

information one needs for defining and classifying song’s genre, but one still needs a way to

identify the song’s genre by picking up the key features from the spectrogram and compare

with the defined features of the various genres. This is when PCA/SVD techniques come very

handy. Recall that SVD can decompose any matrix into three matrices that hold principal

component axis, corresponding singular values and matrix’s original coordinate vectors.

Knowing the fact that the spectrogram is a m*n matrix where m = frequency range of

the song at given time point and n = number of samplings, we can treat the song’s

spectrogram as a matrix X and perform PCA to obtain its principal modes and corresponding

singular values. Recalling that SVD can diagonalize any matrix by introducing appropriate

pair of bases U and V, one can transform spectrogram matrix X into principal component

basis form

Where U is the unitary matrix associated with SVD. Then one can compute the

covariance matrix of Y

Where is the diagonal matrix associated with SVD. In this new basis, the

Figure 2. The example of a song’s spectrogram (Handel’s Messiah) using the Gaussian kernel.

Spectrogram can hold both time and frequency information of the sound signal.

principal components of the song are the each column of U and corresponding singular values

are the elements of . Therefore, by applying PCA into a song, one can perform the

dimensional reduction of the given song’s spectrogram matrix, thereby successfully

representing a song into 1D matrix with under <20 features. This representation of a song

then can be compared to the principal components of a specific musical genre.

The way of obtaining the principal components of a specific musical genre follows

the identical methodology but uses multiple songs instead of a single song. Assuming the fact

that songs within the same genre share the similar musical features, one can construct a

representation matrix of a genre by using the multiple songs within itself. This can be done

by combining the spectrograms of the songs into a single m*n metadata matrix X where m is

now the number of songs that define the genres and n is the row vector of each song’s entire

spectrogram. By using the equation 5 and 6, one can then obtain the principal components

that represent the specific genre defined by multiple songs. This process is called constructing

the training set for a certain genre. Figure 3 shows a simple diagram that describes the

metadata X for a certain genre and a song

Once one obtains principal modes and singular values of both musical genres and a

song of interest, one can finally compare song’s singular values to those of the different

genres to classify the song’s genre.

Algorithm implementation

The implementation of musical genre classification algorithm into MATLAB will be

discussed in this section. The section is divided into two sections: Constructing the training

sets and building a classification algorithm.

Construction of the training sets

1. Select the songs of certain genre and convert them into WAV format

Frequency

Time Song index

Reshaped

Spectrogram

Figure 3. Simplified representation of the metadata X for a specific genre (left) and a song (right).

Notice that for a genre, each row is the reshaped spectrogram, but for a song, X is its spectrogram

itself, where each row represents the frequency range at the given time point.

In order to define a specific genre, one needs songs that are subset of that genre. In

this paper, 30 songs were used for building a training set per each genre but larger the

number of songs one has, one is more likely to represent the genre better. Once one

obtains all the songs for the genres of interest (in this case, three), one should convert

them into WAV format since it’s the format that is supported by all platforms in

MATLAB. However, if one is using Windows 7 or newer, MATLAB can also read

mp3 formats.

2. Construct the time and frequency domains

Before loading the songs into MATLAB, one should construct the appropriate time

and frequency domain for the songs. In this paper, 5 seconds portion of each song

was used therefore one can set the length as 5 and since typical digital music has

44100 frames per seconds, one should set n = 5*44100 = 220500. Using L and n,

construct a linear space for the time domain and k for the frequency domain. Don’t

forget to divide 2pi by L since MATLAB considers periodic domain. Also, ‘fftshift’

your ‘k’ so that the graph comes out normal.

3. Load the songs into MATLAB

Once all the songs are prepared in WAV formats and domains are defined, one can

load them into MATLAB so that each song is represented in a matrix. One will notice

that when MATLAB reads a song, it converts the song into a 2*n matrix where 2 is

the number of channels and n is the total number of frames of the song. As mentioned

in step 2, a digital song has 44100 frames per second, resulting 220500 frames for 5

seconds sample, therefore generating 2*220500 matrix for each song. This is a huge

amount of data for a single song. In order to decrease the size of the matrix, one can

average the two channels into one channel. This effectively decreases the size of the

matrix into half while maintaining the information of the song.

4. Select the 5 seconds portion of the songs

Once the song is converted into matrix and its two channels are averaged into one,

one should select a 5 seconds portion that represents the song. Any part of the song

can be chosen but in this paper, 5 seconds portion in the middle part of the song has

been used. Simply find a column that divides the song into half, and adds 5 seconds

(220500 frames) to find the end point.

5. Define a dummy spectrogram matrix and the sampling rate

Now that the 5 seconds portion of the song is ready, one should define few

components that are needed during the windowed Fourier transform. First, define a

dummy matrix for the spectrogram that will be filled with the frequency information

during the windowed Fourier transform loop. Next, define the sampling rate for 5

seconds portion of the song. This is a crucial part during the development of the

algorithm since too small sampling rate can make the spectrogram matrix too big to

be processed, while too large sampling rate will not capture the right amount of

information needed for the algorithm. Therefore, it is important to find an optimal

sampling rate that gives both manageable size for the matrix and provides sufficient

amount of information of the song. In this paper, sampling rate of 0.1 seconds was

used, giving 51 rows for the generated spectrogram.

6. Perform windowed Fourier transform

Now that the sampling rate has been defined and dummy matrix for the spectrogram

is prepared, one should perform windowed Fourier transform on the 5 seconds

sample of the song. Simply construct a ‘for’ loop that sweeps from the beginning of

the sample to the end. Define a translational kernel that will be multiplied to the

signal at each time point. In this paper, Gaussian kernel with window size -20 has

been used. Once the kernel function is defined, multiply the signal with the kernel

and perform Fourier transform to convert the signal into range of frequencies.

7. Resample the signal

We are still inside the ‘for’ loop. Once the signal is converted into frequencies, one

should resample the frequencies so that the spectrogram has manageable size.

Knowing that PCA removes the redundancy within the data, one can only use the

positive half of the frequency signal by selecting the columns from the center to the

end of the matrix. This will decrease the size of the spectrogram matrix into half but

further truncation can be done by re-sampling the signal. The choice for the rate of re-

sampling can be any number but in this paper, 1/10 has been used. This way, one can

further reduce the size of the spectrogram matrix by 1/10 without losing much

information.

8. Reshape the spectrogram matrix into 1D

Now that the ‘for’ loop generated the spectrogram with a manageable size, one

should reshape the spectrogram matrix into 1D so that it can be incorporated into

metadata X mentioned in theory section. Simply use a ‘reshape’ function to transform

the spectrogram into a row matrix.

9. Merge the spectrograms into single matrix X

Once one repeats the sequence 1 ~ 7 for all songs within the genre (30 in this case),

one should merge all their spectrograms into one single matrix X. Simply define a

empty matrix with the size m*n where m is the number of songs in the genre and n is

the total number of elements in the spectrogram matrix (51 * 11026 = 562326 in this

case) and fill in each row with the reshaped spectrogram for each song. Make sure

that matrix X has a manageable size to be processed.

10. Perform PCA on X

Now the matrix X that holds all the information of the songs within a genre, one

should perform a PCA to obtain the principal modes that represent the genre. First,

compute the data size of the X and mean for each row. Then, subtract the mean to

normalize the matrix. Perform ‘economy’ SVD in order to get unitary matrix U of X.

Make sure to use the ‘economy’ version of SVD to optimize the processing power.

Once the SVD is successfully carried out, define a matrix Y = u*X that produces the

principal component projection of matrix X. Finally, obtain the principal modes of X

by extracting the diagonal values of Y.

11. Create a row vector of principal modes

The diagonal values of Y hold the principal modes of the certain genre from the

largest singular value to the smallest. The final task is to take the transpose of this

vector so that it can be incorporated into the classification algorithm.

Development of the classification algorithm

1. Load the principal modes row vector for each genre

Once the principal modes for all three genres are obtained, load these row vectors

into a new script that develops the classification algorithm.

2. Select the number of modes that represent the genres

Since PCA produces number of modes that is identical to the number of rows for

matrix X, one can only extract the modes that hold most of the information. Usually,

20 modes should be enough to capture more than 90% of the genre’s information.

3. Create a classifier matrix

Once the number of modes has been chosen to evaluate a song, one can construct a

classifier matrix of size m*n where m is the number of genres and n is the number of

feature modes. In this paper, the classifier matrix had a size of 3*20.

4. Obtain the spectrogram of a sample song

Now the classifier matrix is ready, only task that is left is to obtain the spectrogram of

the song to be evaluated and obtain its principal feature modes. This can be done by

simply repeating the sequence 2 to 6 in previous section. Make sure to NOT reshape

the spectrogram this time since the PCA will be performed on the 2D spectrogram

itself.

5. Perform PCA and obtain principal feature modes

Once the spectrogram matrix is generated, simply repeat the sequence 9 in previous

section to obtain the principal feature modes. Don’t forget to match the number of

modes to the one that has been chosen for the classifier matrix. Once the principal

modes are obtained, take its transpose to transform into a row matrix and call it as a

sample song matrix.

6. Define the groups

Now that the both classifier matrix and the sample song matrix are prepared, one

should define the name of the genre for each row of classifier matrix. For example, if

the genres were ‘Rock’, ‘Electronics’ and ‘Jazz’, define the first row of the classifier

matrix as ‘Rock’ second row as ‘Electronics’ and third row as ‘Jazz’.

7. Use k nearest neighbor algorithm to classify the song

Once the sample song matrix, classifier matrix and the group names are defined, one

can use ‘k nearest neighbor’ algorithm to determine the song’s genre. Simply use

MATLAB’s built in function ‘knnclassify’ with three parameters that each represents

the sample, classifier, and the group names. The function will then compare the

sample song matrix with each row of the classifier matrix and declare the name of the

row that matches closest to the sample song matrix as the song’s genre. Test with

multiple songs in order to find the accuracy percentage of the algorithm. Also don’t

forget to cross validate your algorithm by composing the training set and sample set

with different songs. In this paper 6 cross validations have been done and the mean

accuracy has been used as the final accuracy rate.

Results/analysis

Test 1: Band Classification

Figure 4. Principal modes of each artist in Test 1. Notice that overall, Giraffage has the high range of

energy in each mode while Radiohead and Miles Davis have middle and low range of energy in each

mode respectively.

The goal of Test 1 was to classify three different bands from three different genres. In

this paper, artists named Giraffage, Radiohead and Miles Davis were used to represent

Future Beats Electronics, Alternative Rock and Jazz respectively. For constructing a

training set, 30 songs from each artist were used. For each evaluation, 10 new songs from

each artist, total 30 songs were tested to calculate the accuracy rate for each evaluation. The

algorithm was tested with 6 cross validations by changing the training set and samples for

each validation. This method was repeated for Test 2 and 3 as well. Figure 4 shows the

principal modes of each artist and their characteristics. The results for Test 1 are following

Overall accuracy: 48.33%

Standard deviation: 5.48%

Mean accuracy for Giraffage: 83.33%

Mean accuracy for Radiohead: 11.67%

Mean accuracy for Miles Davis: 51.67%

Notice that the algorithm performed very well on artists who occupy the high energy

range and the low energy range for principal modes but performed rather poorly on the artist

who occupy the middle energy range. This is primarily because of the way ‘k nearest

neighbor’ classifies the song. Since each individual song of Radiohead was more likely to

have principal modes that are higher than those of Giraffage training set, the algorithm

performed poorly by classifying them as Giraffage songs. This poor performance on middle

energy range genre is later proved to be a weakness of this algorithm.

Test 2: The Case for Seattle

Figure 5. Principal modes of each artist in Test 2. Giraffage again occupies the high range of energy in

each mode with Skream while Com Truise has middle range of energy in each mode. Since they are all

from Electronics genre, it’s more difficult for algorithm to classify the songs correctly.

The methodology of Test 2 was identical to that of Test 1 but this time, three different

bands from the same genre were tested. In this paper, three great artists in current electronics

scene have been chosen: Giraffage, Com Truise and Skream. Giraffage is the same artist

who was chosen in Test 1. Com Truise is the electronics artist who seeks a modern

interpretation on 80’s electronics sound. Skream is the pioneer of Dubstep genre from UK

who uses a lot of bass in his music. Figure 5 shows the principal modes of each artist. The

results for Test 2 are following.



Mean accuracy for Giraffage: 66.67%

Mean accuracy for Com Truise: 21.67%

Mean accuracy for Skream: 25.00%

Notice that the algorithm performs poorly than Test 1 due to the fact that all three

artists share similar features. The algorithm again performs the best on Giraffage but the

accuracy rate is lower than that of Test 1 since Skream shares very similar principal modes

with Giraffage. Overall, the distinction between the artists was less clear than that of Test 1,

resulting in a poorer performance for algorithm.

Test 3: Genre Classification

Figure 6. Principal modes of each musical genre in Test 3. This time, Rock takes the high range of

energy in each mode while Electronics and Jazz occupy middle and low range of energy respectively.

Since all three genres were fairly distinct from each other, the algorithm had higher chance of

classifying the songs correctly.

Test 3 aimed to test the algorithm’s ability to differentiate the general genre rather

than classifying specific artists. In this paper, Electronics, Heavy Rock and Jazz were chosen

as three musical genres. The training set for each genre were constructed with three different

albums in each genre. See Appendix C to find which artists were included in each genre.

Figure 6 shows the principal modes of each genre. The results for Test 3 are following.



Mean accuracy for Electronics: 53.33%

Mean accuracy for Rock: 50.00%

Mean accuracy for Jazz: 58.33%

Notice that because each genre had fairly distinct musical features, the algorithm

performed well on all three categories. It is interesting to note that the overall accuracy is

highest among three tests and the mean accuracies for each genre are fairly evenly distributed

unlike those in Test 1 and 2. Because each genre’s principal modes occupied more distinct

spots from each other compared to those from Test 1 and 2, a better testing ground was given

to ‘k nearest neighbor’ algorithm.

Summary/Conclusion

In this paper, the development of a simple music genre classification algorithm

utilizing windowed Fourier transform and SVD/PCA techniques was introduced along with

its implementation procedure into MATLAB and its theoretical background. The paper also

tested the algorithm’s performance with three different tests: Band classification, Case for

Seattle and Genre classification. For Test 1, the algorithm had an accuracy of 48.33% with

standard deviation of 5.48%. For the Test 2, the algorithm had an accuracy of 37.78% with

standard deviation of 8.60%. Finally, for the Test 3, the algorithm had an accuracy of 53.89%

with standard deviation of 8.00%. For each test evaluation, the characteristics of all three

categories using SVD/PCA techniques were discussed and also the algorithm’s performance

on each category was briefly analyzed.

Appendix A

In this section, the MATLAB functions that have been used for algorithm

development are introduced with brief implementation explanations.

Linspace: This function is used to construct a linear domain for time and frequency.

Audioread: This function is used to load the audio files into MATLAB

Fft: This function is used to perform windowed Fourier transform on the 5 seconds portion of

the song.

Fftshift: This function is used to shift the frequency domain after FFT.

Abs: This function is used to take the absolute value of the frequency domain after FFT

Resample: This function is used to resample the frequency domain that makes up the

spectrogram in an effort to reduce the size of the matrix.

Subplot: This function is used to plot the principal modes of a song or a musical genre.

Reshape: This function is used to reshape the spectrogram from 2D to 1D

Size: This function used to compute the size of the metadata matrix X

Mean: This function is used to compute the mean of the X for normalization process

Repmat: This function is used to subtract the mean from X for normalization process

Svd: This function is used to perform SVD/PCA on X

Diag: This function is used to extract the diagonal components (singular values) from the

matrix Y

Length: This function is used to compute the length of the principal modes matrix for

plotting purpose.

Scatter: This function is used to plot the principal modes of song or a genre.

Zeros: This function is used to construct an empty matrix that will be filled with reshaped

spectrogram for each row.

Knnclassify: This function is used to classify the sample song’s principal modes with those of

training sets.

Disp: This function is used to display the result of the classification.

Appendix B

In this section, the actual MATLAB coding for algorithm is presented.

Feature Extraction

clear all; close all; clc;

L=5; n=220501; t2=linspace(0,L,n+1); t=t2(1:n); k=(2*pi/L)*[0:n/2-1 -n/2:-1]; ks=fftshift(k);

audioread R(1).wav; vvv1 = ans(:,1); vvv2 = ans(:,2); y = (vvv1 + vvv2)/2; Fs = 44100;

vvv = y'/2; start=length(y)/4;

finish=(length(y)/4)+(5*(Fs)); vv = vvv(1,start:finish); Sgt_spec=[]; tslide=0:0.1:5;

for j=1:length(tslide) g=exp(-20*(t-tslide(j)).^2); % Gaussian Sg=g.*vv; Sgt=fft(Sg); Sgt=Sgt(1,n/2:n); Sgt_spec=[Sgt_spec; resample(abs(fftshift(Sgt)),1,10)]; %subplot(3,1,1), plot(t,vv,'k',t,g,'r') %subplot(3,1,2), plot(t,Sg,'k') %subplot(3,1,3), plot(ks(1,n/2:n),abs(fftshift(Sgt))/max(abs(Sgt))) %drawnow end

X=Sgt_spec; %RR30=reshape(X,[1,562326]); %save('R(30)Spec','RR30');

[m,n]=size(X); % compute data size mn=mean(X,2); % compute mean for each row X=X-repmat(mn,1,n); % subtract mean [u,s,v]=svd(X/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %

produce diagonal variances Y=u'*X; % produce the principal components projection Cy=(1/(n-1))*(Y)*(Y.'); PCs = diag(Cy); %Sum = sum(PCs); %PCnorm = PCs/Sum;

t2 = 1:length(PCs);

scatter(t2,PCs); title('Principal Components'); xlabel('component #'); ylabel('singular value');

Test 1 Training Sets


load('G(1)Spec.mat'); load('G(10)Spec.mat'); load('G(11)Spec.mat');

load('G(12)Spec.mat'); load('G(13)Spec.mat'); load('G(14)Spec.mat'); load('G(15)Spec.mat'); load('G(16)Spec.mat');





load('G(8)Spec.mat'); load('G(9)Spec.mat')

XG=zeros(30,562326);

XG(1,:)=XX1; XG(2,:)=XX2; XG(3,:)=XX3; XG(4,:)=XX4; XG(5,:)=XX5;

XG(6,:)=XX6; XG(7,:)=XX7; XG(8,:)=XX8; XG(9,:)=XX9; XG(10,:)=XX10; XG(11,:)=XX11; XG(12,:)=XX12; XG(13,:)=XX13; XG(14,:)=XX14; XG(15,:)=XX15;



[m,n]=size(XG); % compute data size mn=mean(XG,2); % compute mean for each row XG=XG-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XG/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %

produce diagonal variances YG=u'*XG; % produce the principal components projection CyG=(1/(n-1))*(YG)*(YG.'); PCsG = diag(CyG); %SumG = sum(PCsG); %PCnormG = PCsG/SumG; save('GiraffageTraining','PCsG');

t2 = 1:length(PCsG);

%subplot(3,1,1); %scatter(t2,PCsG); title('Principal Components (Giraffage)');

xlabel('component #'); %ylabel('singular value');

load('R(1)Spec.mat'); load('R(2)Spec.mat'); load('R(3)Spec.mat');

load('R(4)Spec.mat'); load('R(5)Spec.mat'); load('R(6)Spec.mat'); load('R(7)Spec.mat'); load('R(8)Spec.mat');





load('R(29)Spec.mat'); load('R(30)Spec.mat')

XR=zeros(30,562326);

XR(1,:)=RR1; XR(2,:)=RR2; XR(3,:)=RR3; XR(4,:)=RR4; XR(5,:)=RR5;

XR(6,:)=RR6; X(7,:)=RR7; XR(8,:)=RR8; XR(9,:)=RR9; XR(10,:)=RR10; XR(11,:)=RR11; XR(12,:)=RR12; XR(13,:)=RR13; XR(14,:)=RR14; XR(15,:)=RR15;

XR(16,:)=RR16; XR(17,:)=RR17; XR(18,:)=RR18; XR(19,:)=RR19; XR(20,:)=RR20; XR(21,:)=RR21; XR(22,:)=RR22; XR(23,:)=RR23; XR(24,:)=RR24; XR(25,:)=RR25;

XR(26,:)=RR26; XR(27,:)=RR27; XR(28,:)=RR28; XR(29,:)=RR29; XR(30,:)=RR30;

[m,n]=size(XR); % compute data size mn=mean(XR,2); % compute mean for each row XR=XR-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XR/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %

produce diagonal variances YR=u'*XR; % produce the principal components projection CyR=(1/(n-1))*(YR)*(YR.'); PCsR = diag(CyR); %SumR = sum(PCsR); %PCnormR = PCsR/SumR; save('RadioheadTraining','PCsR');

t2 = 1:length(PCsR);

%subplot(3,1,2); scatter(t2,PCsR); title('Principal Components (Radiohead)');

xlabel('component #'); ylabel('singular value');

load('M(1)Spec.mat'); load('M(2)Spec.mat'); load('M(3)Spec.mat');

load('M(4)Spec.mat'); load('M(5)Spec.mat'); load('M(6)Spec.mat'); load('M(7)Spec.mat'); load('M(8)Spec.mat');





load('M(29)Spec.mat'); load('M(30)Spec.mat')

XM=zeros(30,562326);

XM(1,:)=MM1; XM(2,:)=MM2; XM(3,:)=MM3; XM(4,:)=MM4; XM(5,:)=MM5;

XM(6,:)=MM6; X(7,:)=MM7; XM(8,:)=MM8; XM(9,:)=MM9; XM(10,:)=MM10; XM(11,:)=MM11; XM(12,:)=MM12; XM(13,:)=MM13; XM(14,:)=MM14; XM(15,:)=MM15;

XM(16,:)=MM16; XM(17,:)=MM17; XM(18,:)=MM18; XM(19,:)=MM19; XM(20,:)=MM20; XM(21,:)=MM21; XM(22,:)=MM22; XM(23,:)=MM23; XM(24,:)=MM24; XM(25,:)=MM25;

XM(26,:)=MM26; XM(27,:)=MM27; XM(28,:)=MM28; XM(29,:)=MM29; XM(30,:)=MM30;

[m,n]=size(XM); % compute data size mn=mean(XM,2); % compute mean for each row XM=XM-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XM/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %

produce diagonal variances YM=u'*XM; % produce the principal components projection CyM=(1/(n-1))*(YM)*(YM.'); PCsM = diag(CyM); %SumM = sum(PCsM); %PCnormM = PCsM/SumM; save('MilesDavisTraining','PCsM');

t2 = 1:length(PCsM);

%subplot(3,1,3); %scatter(t2,PCsM); title('Principal Components (Miles Davis)');




load('G(1)Spec.mat'); load('G(10)Spec.mat'); load('G(11)Spec.mat');






load('G(8)Spec.mat'); load('G(9)Spec.mat')

XG=zeros(30,562326);





[m,n]=size(XG); % compute data size mn=mean(XG,2); % compute mean for each row XG=XG-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XG/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %

produce diagonal variances YG=u'*XG; % produce the principal components projection CyG=(1/(n-1))*(YG)*(YG.'); PCsG = diag(CyG); %SumG = sum(PCsG); %PCnormG = PCsG/SumG; save('GiraffageTraining','PCsG');

t2 = 1:length(PCsG);

%subplot(3,1,1);

%scatter(t2,PCsG); title('Principal Components (Giraffage)');


load('C(1)Spec.mat'); load('C(2)Spec.mat'); load('C(3)Spec.mat');

load('C(4)Spec.mat'); load('C(5)Spec.mat'); load('C(6)Spec.mat'); load('C(7)Spec.mat'); load('C(8)Spec.mat');





load('C(29)Spec.mat'); load('C(30)Spec.mat')

XC=zeros(30,562326);

XC(1,:)=CC1; XC(2,:)=CC2; XC(3,:)=CC3; XC(4,:)=CC4; XC(5,:)=CC5;

XC(6,:)=CC6; XC(7,:)=CC7; XC(8,:)=CC8; XC(9,:)=CC9; XC(10,:)=CC10; XC(11,:)=CC11; XC(12,:)=CC12; XC(13,:)=CC13; XC(14,:)=CC14; XC(15,:)=CC15;

XC(16,:)=CC16; XC(17,:)=CC17; XC(18,:)=CC18; XC(19,:)=CC19; XC(20,:)=CC20; XC(21,:)=CC21; XC(22,:)=CC22; XC(23,:)=CC23; XC(24,:)=CC24; XC(25,:)=CC25;

XC(26,:)=CC26; XC(27,:)=CC27; XC(28,:)=CC28; XC(29,:)=CC29; XC(30,:)=CC30;

[m,n]=size(XC); % compute data size mn=mean(XC,2); % compute mean for each row XC=XC-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XC/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %

produce diagonal variances YC=u'*XC; % produce the principal components projection CyC=(1/(n-1))*(YC)*(YC.'); PCsC = diag(CyC); %SumR = sum(PCsR); %PCnormR = PCsR/SumR; save('ComTruiseTraining','PCsC');

t2 = 1:length(PCsC);

%subplot(3,1,2); %scatter(t2,PCsC); title('Principal Components (ComTruise)');


load('S(1)Spec.mat'); load('S(2)Spec.mat'); load('S(3)Spec.mat');

load('S(4)Spec.mat'); load('S(5)Spec.mat'); load('S(6)Spec.mat'); load('S(7)Spec.mat'); load('S(8)Spec.mat');





load('S(29)Spec.mat'); load('S(30)Spec.mat')

XS=zeros(30,562326);

XS(1,:)=SS1; XS(2,:)=SS2; XS(3,:)=SS3; XS(4,:)=SS4; XS(5,:)=SS5;

XS(6,:)=SS6; X(7,:)=SS7; XS(8,:)=SS8; XS(9,:)=SS9; XS(10,:)=SS10; XS(11,:)=SS11; XS(12,:)=SS12; XS(13,:)=SS13; XS(14,:)=SS14; XS(15,:)=SS15;

XS(16,:)=SS16; XS(17,:)=SS17; XS(18,:)=SS18; XS(19,:)=SS19; XS(20,:)=SS20; XS(21,:)=SS21; XS(22,:)=SS22; XS(23,:)=SS23; XS(24,:)=SS24; XS(25,:)=SS25;

XS(26,:)=SS26; XS(27,:)=SS27; XS(28,:)=SS28; XS(29,:)=SS29; XS(30,:)=SS30;

[m,n]=size(XS); % compute data size mn=mean(XS,2); % compute mean for each row XS=XS-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XS/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %

produce diagonal variances YS=u'*XS; % produce the principal components projection CyS=(1/(n-1))*(YS)*(YS.'); PCsS = diag(CyS); %SumM = sum(PCsM); %PCnormM = PCsM/SumM; save('SkreamTraining','PCsS');

t2 = 1:length(PCsS);

%subplot(3,1,3);

%scatter(t2,PCsS); title('Principal Components (Skream)');





load('E(1)Spec.mat'); load('E(10)Spec.mat'); load('E(11)Spec.mat');

load('E(12)Spec.mat'); load('E(13)Spec.mat'); load('E(14)Spec.mat'); load('E(15)Spec.mat'); load('E(16)Spec.mat');





load('E(8)Spec.mat'); load('E(9)Spec.mat')

XE=zeros(30,562326);

XE(1,:)=EE1; XE(2,:)=EE2; XE(3,:)=EE3; XE(4,:)=EE4; XE(5,:)=EE5;

XE(6,:)=EE6; XE(7,:)=EE7; XE(8,:)=EE8; XE(9,:)=EE9; XE(10,:)=EE10; XE(11,:)=EE11; XE(12,:)=EE12; XE(13,:)=EE13; XE(14,:)=EE14; XE(15,:)=EE15;

XE(16,:)=EE16; XE(17,:)=EE17; XE(18,:)=EE18; XE(19,:)=EE19; XE(20,:)=EE20; XE(21,:)=EE21; XE(22,:)=EE22; XE(23,:)=EE23; XE(24,:)=EE24; XE(25,:)=EE25;

XE(26,:)=EE26; XE(27,:)=EE27; XE(28,:)=EE28; XE(29,:)=EE29; XE(30,:)=EE30;

[m,n]=size(XE); % compute data size mn=mean(XE,2); % compute mean for each row XE=XE-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XE/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %

produce diagonal variances YE=u'*XE; % produce the principal components projection CyE=(1/(n-1))*(YE)*(YE.'); PCsE = diag(CyE); %SumG = sum(PCsG); %PCnormG = PCsG/SumG; save('ElectronicsTraining','PCsE');

t2 = 1:length(PCsE);

%subplot(3,1,1);

%scatter(t2,PCsE); title('Principal Components (Electronics)');


load('T(1)Spec.mat'); load('T(2)Spec.mat'); load('T(3)Spec.mat');

load('T(4)Spec.mat'); load('T(5)Spec.mat'); load('T(6)Spec.mat'); load('T(7)Spec.mat'); load('T(8)Spec.mat');


load('T(14)Spec.mat'); load('T(15)Spec.mat');

load('T(16)Spec.mat'); load('T(17)Spec.mat'); load('T(18)Spec.mat');



load('T(29)Spec.mat'); load('T(30)Spec.mat')

XT=zeros(30,562326);

XT(1,:)=TT1; XT(2,:)=TT2; XT(3,:)=TT3; XT(4,:)=TT4; XT(5,:)=TT5;

XT(6,:)=TT6; XT(7,:)=TT7; XT(8,:)=TT8; XT(9,:)=TT9; XT(10,:)=TT10; XT(11,:)=TT11; XT(12,:)=TT12; XT(13,:)=TT13; XT(14,:)=TT14; XT(15,:)=TT15;

XT(16,:)=TT16; XT(17,:)=TT17; XT(18,:)=TT18; XT(19,:)=TT19; XT(20,:)=TT20; XT(21,:)=TT21; XT(22,:)=TT22; XT(23,:)=TT23; XT(24,:)=TT24; XT(25,:)=TT25;

XT(26,:)=TT26; XT(27,:)=TT27; XT(28,:)=TT28; XT(29,:)=TT29; XT(30,:)=TT30;

[m,n]=size(XT); % compute data size mn=mean(XT,2); % compute mean for each row XT=XT-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XT/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %

produce diagonal variances YT=u'*XT; % produce the principal components projection CyT=(1/(n-1))*(YT)*(YT.'); PCsT = diag(CyT); %SumR = sum(PCsR); %PCnormR = PCsR/SumR; save('RockTraining','PCsT');

t2 = 1:length(PCsT);

%subplot(3,1,2); %scatter(t2,PCsT); title('Principal Components (Rock)'); xlabel('component

#'); %ylabel('singular value');

load('J(1)Spec.mat'); load('J(2)Spec.mat'); load('J(3)Spec.mat');

load('J(4)Spec.mat'); load('J(5)Spec.mat'); load('J(6)Spec.mat'); load('J(7)Spec.mat'); load('J(8)Spec.mat');





load('J(29)Spec.mat'); load('J(30)Spec.mat')

XJ=zeros(30,562326);

XJ(1,:)=JJ1; XJ(2,:)=JJ2; XJ(3,:)=JJ3; XJ(4,:)=JJ4; XJ(5,:)=JJ5;

XJ(6,:)=JJ6; XJ(7,:)=JJ7; XJ(8,:)=JJ8; XJ(9,:)=JJ9; XJ(10,:)=JJ10; XJ(11,:)=JJ11; XJ(12,:)=JJ12; XJ(13,:)=JJ13; XJ(14,:)=JJ14; XJ(15,:)=JJ15;

XJ(16,:)=JJ16; XJ(17,:)=JJ17; XJ(18,:)=JJ18; XJ(19,:)=JJ19; XJ(20,:)=JJ20; XJ(21,:)=JJ21; XJ(22,:)=JJ22; XJ(23,:)=JJ23; XJ(24,:)=JJ24; XJ(25,:)=JJ25;

XJ(26,:)=JJ26; XJ(27,:)=JJ27; XJ(28,:)=JJ28; XJ(29,:)=JJ29; XJ(30,:)=JJ30;

[m,n]=size(XJ); % compute data size mn=mean(XJ,2); % compute mean for each row XJ=XJ-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XJ/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %

produce diagonal variances YJ=u'*XJ; % produce the principal components projection CyJ=(1/(n-1))*(YJ)*(YJ.'); PCsJ = diag(CyJ); %SumM = sum(PCsM); %PCnormM = PCsM/SumM; save('JazzTraining','PCsJ');

t2 = 1:length(PCsJ);

%subplot(3,1,3); scatter(t2,PCsJ); title('Principal Components (Jazz)'); xlabel('component

#'); ylabel('singular value');

Genre Recognition for Test 1


% Load training sets load('GiraffageTraining.mat'); load('MilesDavisTraining.mat');

load('RadioheadTraining.mat');

Giraffage = PCsG(2:20,:).'; Radiohead = PCsR(2:20,:).'; MilesDavis = PCsM(2:20,:).';

% Load music to be identified with PCA algorithm


audioread M(30).wav; vvv1 = ans(:,1); vvv2 = ans(:,2); y = (vvv1 + vvv2)/2; Fs = 44100;

vvv = y'/2; start=length(y)/2; finish=(length(y)/2)+(5*(Fs)); vv = vvv(1,start:finish); Sgt_spec=[]; tslide=0:0.1:5;

for j=1:length(tslide) g=exp(-20*(t-tslide(j)).^2); % Gaussian Sg=g.*vv; Sgt=fft(Sg); Sgt=Sgt(1,n/2:n); Sgt_spec=[Sgt_spec; resample(abs(fftshift(Sgt)),1,10)]; %subplot(3,1,1), plot(t,vv,'k',t,g,'r') %subplot(3,1,2), plot(t,Sg,'k') %subplot(3,1,3), plot(ks(1,n/2:n),abs(fftshift(Sgt))/max(abs(Sgt)))

%drawnow end

X=Sgt_spec;



% Sample Testmusic = PCs(2:20,:).';

% Training Classifier = zeros(3,19); Classifier(1,:) = Giraffage; Classifier(2,:) = Radiohead; Classifier(3,:) = MilesDavis;

% Group Group = {'Giraffage'; 'Radiohead'; 'MilesDavis'};

% Function Genre = knnclassify(Testmusic, Classifier, Group);

% Display result disp('result:'); disp(Genre);



% Load training sets load('GiraffageTraining.mat'); load('ComTruiseTraining.mat');

load('SkreamTraining.mat');

Giraffage = PCsG(2:20,:).'; ComTruise = PCsC(2:20,:).'; Skream = PCsS(2:20,:).';



audioread S(30).wav; vvv1 = ans(:,1);

vvv2 = ans(:,2); y = (vvv1 + vvv2)/2; Fs = 44100;



X=Sgt_spec;




% Training Classifier = zeros(3,19); Classifier(1,:) = Giraffage; Classifier(2,:) = ComTruise; Classifier(3,:) = Skream;

% Group Group = {'Giraffage'; 'ComTruise'; 'Skream'};





% Load training sets load('ElectronicsTraining.mat'); load('RockTraining.mat');

load('JazzTraining.mat');

Electronics = PCsE(2:20,:).'; Rock = PCsT(2:20,:).'; Jazz = PCsJ(2:20,:).';



audioread J(30).wav; vvv1 = ans(:,1); vvv2 = ans(:,2); y = (vvv1 + vvv2)/2; Fs = 44100;



X=Sgt_spec;




% Training Classifier = zeros(3,19);

Classifier(1,:) = Electronics; Classifier(2,:) = Rock; Classifier(3,:) = Jazz;

% Group Group = {'Electronics'; 'Rock'; 'Jazz'};



Appendix C

In this section, artists and their albums that were used for algorithm development are

presented. They are all awesome artists. You should definitely check them out.

Test 1

Giraffage

- Needs

- Comfort

- No Reason

- Janet Jackson: Someone to call my lover Giraffage Remix

- Miley Cyrus: Party in the USA Giraffage Remix

- R Kelly: Ignition Giraffage Remix

- Stardust: Music sounds better with you Giraffage Remix

- The Dream: Shawty is da shit Giraffage Remix

- And 20+ more remix songs from Giraffage Soundcloud

Radiohead

- The Bends

- Ok Computer

- Kid A

- Amnesiac

- Hail to the Thief

Miles Davis

- Bitches Brew Legacy Edition

- Birth of Cool

- Kind of Blue

- In a Silent Way

- Quiet Nights

Test 2

Giraffage

- Same as Test 1

Com Truise

- In Decay

- Galactic Melt

- Wave 1

- Cyanide Sister EP

- Fairlight

Skream

- Skream!

- Skreamizm Vol 1

- Skreamizm Vol 2

- Skreamizm Vol 3

- Skreamizm Vol 4

- Skreamizm Vol 5

- Skreamizm Vol 6

Test 3

Giraffage

- Same as Test 1

Com Truise

- Same as Test 2

Skream

- Same as Test 2

Meshuggah

- Obzen

- Koloss

- Nothing

Animals as Leaders

- Animals as Leaders

- Weightless

- The Joy of Motion

Miles Davis

- Same as Test 2

John Coltrane

- Duke Elington & John Coltrane

- Love Supreme

- Blue Train

project 5 (music genre recognition using pca)

Documents