project 5 (music genre recognition using pca)
TRANSCRIPT
Development of Music Genre Recognition Algorithm using
Windowed Fourier Transform and SVD/PCA techniques
Author
Jimin Kim
Abstract
When we listen to a song, the genre of the song is immediately recognizable to us.
This implies the fact that our brain is capable of processing the auditory information and
categorizes them into the genres we are familiar of. This paper attempts to develop a simple
algorithm that mimics the brain’s processing and categorization of a sound signal through
applying SVD/PCA technique. By constructing a training set for each musical category using
the spectrograms and PCA technique, one can build a statistical testing algorithm that is
capable of identifying the genre of a given music sample. The objective of this paper is to
introduce the development of such algorithm and successfully implement into MATLAB to
test its classification performance with realistic music samples from various genres.
Introduction/Overview
The paper will first introduce the theoretical development of the algorithm using
spectrograms and PCA technique. Followed by the theoretical background, the
implementation of algorithm into MATLAB will be introduced. Once the algorithm is fully
integrated into MATLAB, it will be applied to three different trials. The first trial will test
algorithm’s ability to classify the songs from three different bands from different genres. In
second trial, the same methodology will be used as the first trial except that three bands will
be from the same genre. Finally in the third trial, the algorithm will be tested with the general
classification of the music genres such as Jazz, Rock and Electronics.
Theoretical background
The theoretical framework behind the algorithm consists of two main concepts:
Windowed Fourier transform and Principal Component Analysis. When it comes to the
development of music genre recognition algorithm, the most important process is to pick up
the unique features that define the songs such as timbre, tempo and beats. One of the most
effective ways of capturing these features is by using the Windowed Fourier transform.
Mathematically, Windowed Fourier transform is Fourier transform with slight
modification. Recall that Fourier transform equation states
(1)
Where k is the frequency domain and x is the position (or time) domain. The
Windowed Fourier transform adds a time translation kernel
(2)
Into the Fourier transform, which then becomes
(3)
Here, the term induces the time localization of the Fourier integral around
. Therefore, as varies in the given time interval, g sweeps through the signal and
picks up the frequency information from each point in time, just as shown in top picture of
the Figure 1. By introducing the translational kernel, windowed Fourier transform enables
one to investigate both time and frequency information of a given signal with some trade off
from both domains. Therefore, when this technique is applied to the sound signal such as a
song, one can construct a spectrogram that holds both time and frequency information. This is
precisely what one should look for during the development of music genre recognition
algorithm. Because the spectrogram not only provides information about the song’s overall
frequency range, but also holds information about time dependent features such as tempo and
rhythm, it can serve as an excellent quantitative representation of a song of the interest.
Figure 2 shows an example of such spectrogram.
Figure 1. This figures describes how Windowed Fourier transform is performed. The top picture shows
the overlap between the translational kernel function (red) with the signal. The middle picture shows
the filtered signal at the given timestamp. The bottom picture shows the FFT transform of the filtered
signal.
However, there is one problem to overcome. Spectrograms do offer a good amount of
information one needs for defining and classifying song’s genre, but one still needs a way to
identify the song’s genre by picking up the key features from the spectrogram and compare
with the defined features of the various genres. This is when PCA/SVD techniques come very
handy. Recall that SVD can decompose any matrix into three matrices that hold principal
component axis, corresponding singular values and matrix’s original coordinate vectors.
Knowing the fact that the spectrogram is a m*n matrix where m = frequency range of
the song at given time point and n = number of samplings, we can treat the song’s
spectrogram as a matrix X and perform PCA to obtain its principal modes and corresponding
singular values. Recalling that SVD can diagonalize any matrix by introducing appropriate
pair of bases U and V, one can transform spectrogram matrix X into principal component
basis form
Where U is the unitary matrix associated with SVD. Then one can compute the
covariance matrix of Y
Where is the diagonal matrix associated with SVD. In this new basis, the
Figure 2. The example of a song’s spectrogram (Handel’s Messiah) using the Gaussian kernel.
Spectrogram can hold both time and frequency information of the sound signal.
principal components of the song are the each column of U and corresponding singular values
are the elements of . Therefore, by applying PCA into a song, one can perform the
dimensional reduction of the given song’s spectrogram matrix, thereby successfully
representing a song into 1D matrix with under <20 features. This representation of a song
then can be compared to the principal components of a specific musical genre.
The way of obtaining the principal components of a specific musical genre follows
the identical methodology but uses multiple songs instead of a single song. Assuming the fact
that songs within the same genre share the similar musical features, one can construct a
representation matrix of a genre by using the multiple songs within itself. This can be done
by combining the spectrograms of the songs into a single m*n metadata matrix X where m is
now the number of songs that define the genres and n is the row vector of each song’s entire
spectrogram. By using the equation 5 and 6, one can then obtain the principal components
that represent the specific genre defined by multiple songs. This process is called constructing
the training set for a certain genre. Figure 3 shows a simple diagram that describes the
metadata X for a certain genre and a song
Once one obtains principal modes and singular values of both musical genres and a
song of interest, one can finally compare song’s singular values to those of the different
genres to classify the song’s genre.
Algorithm implementation
The implementation of musical genre classification algorithm into MATLAB will be
discussed in this section. The section is divided into two sections: Constructing the training
sets and building a classification algorithm.
Construction of the training sets
1. Select the songs of certain genre and convert them into WAV format
Frequency
Time Song index
Reshaped
Spectrogram
Figure 3. Simplified representation of the metadata X for a specific genre (left) and a song (right).
Notice that for a genre, each row is the reshaped spectrogram, but for a song, X is its spectrogram
itself, where each row represents the frequency range at the given time point.
In order to define a specific genre, one needs songs that are subset of that genre. In
this paper, 30 songs were used for building a training set per each genre but larger the
number of songs one has, one is more likely to represent the genre better. Once one
obtains all the songs for the genres of interest (in this case, three), one should convert
them into WAV format since it’s the format that is supported by all platforms in
MATLAB. However, if one is using Windows 7 or newer, MATLAB can also read
mp3 formats.
2. Construct the time and frequency domains
Before loading the songs into MATLAB, one should construct the appropriate time
and frequency domain for the songs. In this paper, 5 seconds portion of each song
was used therefore one can set the length as 5 and since typical digital music has
44100 frames per seconds, one should set n = 5*44100 = 220500. Using L and n,
construct a linear space for the time domain and k for the frequency domain. Don’t
forget to divide 2pi by L since MATLAB considers periodic domain. Also, ‘fftshift’
your ‘k’ so that the graph comes out normal.
3. Load the songs into MATLAB
Once all the songs are prepared in WAV formats and domains are defined, one can
load them into MATLAB so that each song is represented in a matrix. One will notice
that when MATLAB reads a song, it converts the song into a 2*n matrix where 2 is
the number of channels and n is the total number of frames of the song. As mentioned
in step 2, a digital song has 44100 frames per second, resulting 220500 frames for 5
seconds sample, therefore generating 2*220500 matrix for each song. This is a huge
amount of data for a single song. In order to decrease the size of the matrix, one can
average the two channels into one channel. This effectively decreases the size of the
matrix into half while maintaining the information of the song.
4. Select the 5 seconds portion of the songs
Once the song is converted into matrix and its two channels are averaged into one,
one should select a 5 seconds portion that represents the song. Any part of the song
can be chosen but in this paper, 5 seconds portion in the middle part of the song has
been used. Simply find a column that divides the song into half, and adds 5 seconds
(220500 frames) to find the end point.
5. Define a dummy spectrogram matrix and the sampling rate
Now that the 5 seconds portion of the song is ready, one should define few
components that are needed during the windowed Fourier transform. First, define a
dummy matrix for the spectrogram that will be filled with the frequency information
during the windowed Fourier transform loop. Next, define the sampling rate for 5
seconds portion of the song. This is a crucial part during the development of the
algorithm since too small sampling rate can make the spectrogram matrix too big to
be processed, while too large sampling rate will not capture the right amount of
information needed for the algorithm. Therefore, it is important to find an optimal
sampling rate that gives both manageable size for the matrix and provides sufficient
amount of information of the song. In this paper, sampling rate of 0.1 seconds was
used, giving 51 rows for the generated spectrogram.
6. Perform windowed Fourier transform
Now that the sampling rate has been defined and dummy matrix for the spectrogram
is prepared, one should perform windowed Fourier transform on the 5 seconds
sample of the song. Simply construct a ‘for’ loop that sweeps from the beginning of
the sample to the end. Define a translational kernel that will be multiplied to the
signal at each time point. In this paper, Gaussian kernel with window size -20 has
been used. Once the kernel function is defined, multiply the signal with the kernel
and perform Fourier transform to convert the signal into range of frequencies.
7. Resample the signal
We are still inside the ‘for’ loop. Once the signal is converted into frequencies, one
should resample the frequencies so that the spectrogram has manageable size.
Knowing that PCA removes the redundancy within the data, one can only use the
positive half of the frequency signal by selecting the columns from the center to the
end of the matrix. This will decrease the size of the spectrogram matrix into half but
further truncation can be done by re-sampling the signal. The choice for the rate of re-
sampling can be any number but in this paper, 1/10 has been used. This way, one can
further reduce the size of the spectrogram matrix by 1/10 without losing much
information.
8. Reshape the spectrogram matrix into 1D
Now that the ‘for’ loop generated the spectrogram with a manageable size, one
should reshape the spectrogram matrix into 1D so that it can be incorporated into
metadata X mentioned in theory section. Simply use a ‘reshape’ function to transform
the spectrogram into a row matrix.
9. Merge the spectrograms into single matrix X
Once one repeats the sequence 1 ~ 7 for all songs within the genre (30 in this case),
one should merge all their spectrograms into one single matrix X. Simply define a
empty matrix with the size m*n where m is the number of songs in the genre and n is
the total number of elements in the spectrogram matrix (51 * 11026 = 562326 in this
case) and fill in each row with the reshaped spectrogram for each song. Make sure
that matrix X has a manageable size to be processed.
10. Perform PCA on X
Now the matrix X that holds all the information of the songs within a genre, one
should perform a PCA to obtain the principal modes that represent the genre. First,
compute the data size of the X and mean for each row. Then, subtract the mean to
normalize the matrix. Perform ‘economy’ SVD in order to get unitary matrix U of X.
Make sure to use the ‘economy’ version of SVD to optimize the processing power.
Once the SVD is successfully carried out, define a matrix Y = u*X that produces the
principal component projection of matrix X. Finally, obtain the principal modes of X
by extracting the diagonal values of Y.
11. Create a row vector of principal modes
The diagonal values of Y hold the principal modes of the certain genre from the
largest singular value to the smallest. The final task is to take the transpose of this
vector so that it can be incorporated into the classification algorithm.
Development of the classification algorithm
1. Load the principal modes row vector for each genre
Once the principal modes for all three genres are obtained, load these row vectors
into a new script that develops the classification algorithm.
2. Select the number of modes that represent the genres
Since PCA produces number of modes that is identical to the number of rows for
matrix X, one can only extract the modes that hold most of the information. Usually,
20 modes should be enough to capture more than 90% of the genre’s information.
3. Create a classifier matrix
Once the number of modes has been chosen to evaluate a song, one can construct a
classifier matrix of size m*n where m is the number of genres and n is the number of
feature modes. In this paper, the classifier matrix had a size of 3*20.
4. Obtain the spectrogram of a sample song
Now the classifier matrix is ready, only task that is left is to obtain the spectrogram of
the song to be evaluated and obtain its principal feature modes. This can be done by
simply repeating the sequence 2 to 6 in previous section. Make sure to NOT reshape
the spectrogram this time since the PCA will be performed on the 2D spectrogram
itself.
5. Perform PCA and obtain principal feature modes
Once the spectrogram matrix is generated, simply repeat the sequence 9 in previous
section to obtain the principal feature modes. Don’t forget to match the number of
modes to the one that has been chosen for the classifier matrix. Once the principal
modes are obtained, take its transpose to transform into a row matrix and call it as a
sample song matrix.
6. Define the groups
Now that the both classifier matrix and the sample song matrix are prepared, one
should define the name of the genre for each row of classifier matrix. For example, if
the genres were ‘Rock’, ‘Electronics’ and ‘Jazz’, define the first row of the classifier
matrix as ‘Rock’ second row as ‘Electronics’ and third row as ‘Jazz’.
7. Use k nearest neighbor algorithm to classify the song
Once the sample song matrix, classifier matrix and the group names are defined, one
can use ‘k nearest neighbor’ algorithm to determine the song’s genre. Simply use
MATLAB’s built in function ‘knnclassify’ with three parameters that each represents
the sample, classifier, and the group names. The function will then compare the
sample song matrix with each row of the classifier matrix and declare the name of the
row that matches closest to the sample song matrix as the song’s genre. Test with
multiple songs in order to find the accuracy percentage of the algorithm. Also don’t
forget to cross validate your algorithm by composing the training set and sample set
with different songs. In this paper 6 cross validations have been done and the mean
accuracy has been used as the final accuracy rate.
Results/analysis
Test 1: Band Classification
Figure 4. Principal modes of each artist in Test 1. Notice that overall, Giraffage has the high range of
energy in each mode while Radiohead and Miles Davis have middle and low range of energy in each
mode respectively.
The goal of Test 1 was to classify three different bands from three different genres. In
this paper, artists named Giraffage, Radiohead and Miles Davis were used to represent
Future Beats Electronics, Alternative Rock and Jazz respectively. For constructing a
training set, 30 songs from each artist were used. For each evaluation, 10 new songs from
each artist, total 30 songs were tested to calculate the accuracy rate for each evaluation. The
algorithm was tested with 6 cross validations by changing the training set and samples for
each validation. This method was repeated for Test 2 and 3 as well. Figure 4 shows the
principal modes of each artist and their characteristics. The results for Test 1 are following
Overall accuracy: 48.33%
Standard deviation: 5.48%
Mean accuracy for Giraffage: 83.33%
Mean accuracy for Radiohead: 11.67%
Mean accuracy for Miles Davis: 51.67%
Notice that the algorithm performed very well on artists who occupy the high energy
range and the low energy range for principal modes but performed rather poorly on the artist
who occupy the middle energy range. This is primarily because of the way ‘k nearest
neighbor’ classifies the song. Since each individual song of Radiohead was more likely to
have principal modes that are higher than those of Giraffage training set, the algorithm
performed poorly by classifying them as Giraffage songs. This poor performance on middle
energy range genre is later proved to be a weakness of this algorithm.
Test 2: The Case for Seattle
Figure 5. Principal modes of each artist in Test 2. Giraffage again occupies the high range of energy in
each mode with Skream while Com Truise has middle range of energy in each mode. Since they are all
from Electronics genre, it’s more difficult for algorithm to classify the songs correctly.
The methodology of Test 2 was identical to that of Test 1 but this time, three different
bands from the same genre were tested. In this paper, three great artists in current electronics
scene have been chosen: Giraffage, Com Truise and Skream. Giraffage is the same artist
who was chosen in Test 1. Com Truise is the electronics artist who seeks a modern
interpretation on 80’s electronics sound. Skream is the pioneer of Dubstep genre from UK
who uses a lot of bass in his music. Figure 5 shows the principal modes of each artist. The
results for Test 2 are following.
Overall accuracy: 37.78%
Standard deviation: 8.60%
Mean accuracy for Giraffage: 66.67%
Mean accuracy for Com Truise: 21.67%
Mean accuracy for Skream: 25.00%
Notice that the algorithm performs poorly than Test 1 due to the fact that all three
artists share similar features. The algorithm again performs the best on Giraffage but the
accuracy rate is lower than that of Test 1 since Skream shares very similar principal modes
with Giraffage. Overall, the distinction between the artists was less clear than that of Test 1,
resulting in a poorer performance for algorithm.
Test 3: Genre Classification
Figure 6. Principal modes of each musical genre in Test 3. This time, Rock takes the high range of
energy in each mode while Electronics and Jazz occupy middle and low range of energy respectively.
Since all three genres were fairly distinct from each other, the algorithm had higher chance of
classifying the songs correctly.
Test 3 aimed to test the algorithm’s ability to differentiate the general genre rather
than classifying specific artists. In this paper, Electronics, Heavy Rock and Jazz were chosen
as three musical genres. The training set for each genre were constructed with three different
albums in each genre. See Appendix C to find which artists were included in each genre.
Figure 6 shows the principal modes of each genre. The results for Test 3 are following.
Overall accuracy: 53.89%
Standard deviation: 8.00%
Mean accuracy for Electronics: 53.33%
Mean accuracy for Rock: 50.00%
Mean accuracy for Jazz: 58.33%
Notice that because each genre had fairly distinct musical features, the algorithm
performed well on all three categories. It is interesting to note that the overall accuracy is
highest among three tests and the mean accuracies for each genre are fairly evenly distributed
unlike those in Test 1 and 2. Because each genre’s principal modes occupied more distinct
spots from each other compared to those from Test 1 and 2, a better testing ground was given
to ‘k nearest neighbor’ algorithm.
Summary/Conclusion
In this paper, the development of a simple music genre classification algorithm
utilizing windowed Fourier transform and SVD/PCA techniques was introduced along with
its implementation procedure into MATLAB and its theoretical background. The paper also
tested the algorithm’s performance with three different tests: Band classification, Case for
Seattle and Genre classification. For Test 1, the algorithm had an accuracy of 48.33% with
standard deviation of 5.48%. For the Test 2, the algorithm had an accuracy of 37.78% with
standard deviation of 8.60%. Finally, for the Test 3, the algorithm had an accuracy of 53.89%
with standard deviation of 8.00%. For each test evaluation, the characteristics of all three
categories using SVD/PCA techniques were discussed and also the algorithm’s performance
on each category was briefly analyzed.
Appendix A
In this section, the MATLAB functions that have been used for algorithm
development are introduced with brief implementation explanations.
Linspace: This function is used to construct a linear domain for time and frequency.
Audioread: This function is used to load the audio files into MATLAB
Fft: This function is used to perform windowed Fourier transform on the 5 seconds portion of
the song.
Fftshift: This function is used to shift the frequency domain after FFT.
Abs: This function is used to take the absolute value of the frequency domain after FFT
Resample: This function is used to resample the frequency domain that makes up the
spectrogram in an effort to reduce the size of the matrix.
Subplot: This function is used to plot the principal modes of a song or a musical genre.
Reshape: This function is used to reshape the spectrogram from 2D to 1D
Size: This function used to compute the size of the metadata matrix X
Mean: This function is used to compute the mean of the X for normalization process
Repmat: This function is used to subtract the mean from X for normalization process
Svd: This function is used to perform SVD/PCA on X
Diag: This function is used to extract the diagonal components (singular values) from the
matrix Y
Length: This function is used to compute the length of the principal modes matrix for
plotting purpose.
Scatter: This function is used to plot the principal modes of song or a genre.
Zeros: This function is used to construct an empty matrix that will be filled with reshaped
spectrogram for each row.
Knnclassify: This function is used to classify the sample song’s principal modes with those of
training sets.
Disp: This function is used to display the result of the classification.
Appendix B
In this section, the actual MATLAB coding for algorithm is presented.
Feature Extraction
clear all; close all; clc;
L=5; n=220501; t2=linspace(0,L,n+1); t=t2(1:n); k=(2*pi/L)*[0:n/2-1 -n/2:-1]; ks=fftshift(k);
audioread R(1).wav; vvv1 = ans(:,1); vvv2 = ans(:,2); y = (vvv1 + vvv2)/2; Fs = 44100;
vvv = y'/2; start=length(y)/4;
finish=(length(y)/4)+(5*(Fs)); vv = vvv(1,start:finish); Sgt_spec=[]; tslide=0:0.1:5;
for j=1:length(tslide) g=exp(-20*(t-tslide(j)).^2); % Gaussian Sg=g.*vv; Sgt=fft(Sg); Sgt=Sgt(1,n/2:n); Sgt_spec=[Sgt_spec; resample(abs(fftshift(Sgt)),1,10)]; %subplot(3,1,1), plot(t,vv,'k',t,g,'r') %subplot(3,1,2), plot(t,Sg,'k') %subplot(3,1,3), plot(ks(1,n/2:n),abs(fftshift(Sgt))/max(abs(Sgt))) %drawnow end
X=Sgt_spec; %RR30=reshape(X,[1,562326]); %save('R(30)Spec','RR30');
[m,n]=size(X); % compute data size mn=mean(X,2); % compute mean for each row X=X-repmat(mn,1,n); % subtract mean [u,s,v]=svd(X/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %
produce diagonal variances Y=u'*X; % produce the principal components projection Cy=(1/(n-1))*(Y)*(Y.'); PCs = diag(Cy); %Sum = sum(PCs); %PCnorm = PCs/Sum;
t2 = 1:length(PCs);
scatter(t2,PCs); title('Principal Components'); xlabel('component #'); ylabel('singular value');
Test 1 Training Sets
clear all; close all; clc;
load('G(1)Spec.mat'); load('G(10)Spec.mat'); load('G(11)Spec.mat');
load('G(12)Spec.mat'); load('G(13)Spec.mat'); load('G(14)Spec.mat'); load('G(15)Spec.mat'); load('G(16)Spec.mat');
load('G(17)Spec.mat'); load('G(18)Spec.mat'); load('G(19)Spec.mat'); load('G(2)Spec.mat'); load('G(20)Spec.mat');
load('G(21)Spec.mat'); load('G(22)Spec.mat'); load('G(23)Spec.mat'); load('G(24)Spec.mat'); load('G(25)Spec.mat');
load('G(26)Spec.mat'); load('G(27)Spec.mat'); load('G(28)Spec.mat'); load('G(29)Spec.mat'); load('G(3)Spec.mat');
load('G(30)Spec.mat'); load('G(4)Spec.mat'); load('G(5)Spec.mat'); load('G(6)Spec.mat'); load('G(7)Spec.mat');
load('G(8)Spec.mat'); load('G(9)Spec.mat')
XG=zeros(30,562326);
XG(1,:)=XX1; XG(2,:)=XX2; XG(3,:)=XX3; XG(4,:)=XX4; XG(5,:)=XX5;
XG(6,:)=XX6; XG(7,:)=XX7; XG(8,:)=XX8; XG(9,:)=XX9; XG(10,:)=XX10; XG(11,:)=XX11; XG(12,:)=XX12; XG(13,:)=XX13; XG(14,:)=XX14; XG(15,:)=XX15;
XG(16,:)=XX16; XG(17,:)=XX17; XG(18,:)=XX18; XG(19,:)=XX19; XG(20,:)=XX20; XG(21,:)=XX21; XG(22,:)=XX22; XG(23,:)=XX23; XG(24,:)=XX24; XG(25,:)=XX25;
XG(26,:)=XX26; XG(27,:)=XX27; XG(28,:)=XX28; XG(29,:)=XX29; XG(30,:)=XX30;
[m,n]=size(XG); % compute data size mn=mean(XG,2); % compute mean for each row XG=XG-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XG/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %
produce diagonal variances YG=u'*XG; % produce the principal components projection CyG=(1/(n-1))*(YG)*(YG.'); PCsG = diag(CyG); %SumG = sum(PCsG); %PCnormG = PCsG/SumG; save('GiraffageTraining','PCsG');
t2 = 1:length(PCsG);
%subplot(3,1,1); %scatter(t2,PCsG); title('Principal Components (Giraffage)');
xlabel('component #'); %ylabel('singular value');
load('R(1)Spec.mat'); load('R(2)Spec.mat'); load('R(3)Spec.mat');
load('R(4)Spec.mat'); load('R(5)Spec.mat'); load('R(6)Spec.mat'); load('R(7)Spec.mat'); load('R(8)Spec.mat');
load('R(9)Spec.mat'); load('R(10)Spec.mat'); load('R(11)Spec.mat'); load('R(12)Spec.mat'); load('R(13)Spec.mat');
load('R(14)Spec.mat'); load('R(15)Spec.mat'); load('R(16)Spec.mat'); load('R(17)Spec.mat'); load('R(18)Spec.mat');
load('R(19)Spec.mat'); load('R(20)Spec.mat'); load('R(21)Spec.mat'); load('R(22)Spec.mat'); load('R(23)Spec.mat');
load('R(24)Spec.mat'); load('R(25)Spec.mat'); load('R(26)Spec.mat'); load('R(27)Spec.mat'); load('R(28)Spec.mat');
load('R(29)Spec.mat'); load('R(30)Spec.mat')
XR=zeros(30,562326);
XR(1,:)=RR1; XR(2,:)=RR2; XR(3,:)=RR3; XR(4,:)=RR4; XR(5,:)=RR5;
XR(6,:)=RR6; X(7,:)=RR7; XR(8,:)=RR8; XR(9,:)=RR9; XR(10,:)=RR10; XR(11,:)=RR11; XR(12,:)=RR12; XR(13,:)=RR13; XR(14,:)=RR14; XR(15,:)=RR15;
XR(16,:)=RR16; XR(17,:)=RR17; XR(18,:)=RR18; XR(19,:)=RR19; XR(20,:)=RR20; XR(21,:)=RR21; XR(22,:)=RR22; XR(23,:)=RR23; XR(24,:)=RR24; XR(25,:)=RR25;
XR(26,:)=RR26; XR(27,:)=RR27; XR(28,:)=RR28; XR(29,:)=RR29; XR(30,:)=RR30;
[m,n]=size(XR); % compute data size mn=mean(XR,2); % compute mean for each row XR=XR-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XR/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %
produce diagonal variances YR=u'*XR; % produce the principal components projection CyR=(1/(n-1))*(YR)*(YR.'); PCsR = diag(CyR); %SumR = sum(PCsR); %PCnormR = PCsR/SumR; save('RadioheadTraining','PCsR');
t2 = 1:length(PCsR);
%subplot(3,1,2); scatter(t2,PCsR); title('Principal Components (Radiohead)');
xlabel('component #'); ylabel('singular value');
load('M(1)Spec.mat'); load('M(2)Spec.mat'); load('M(3)Spec.mat');
load('M(4)Spec.mat'); load('M(5)Spec.mat'); load('M(6)Spec.mat'); load('M(7)Spec.mat'); load('M(8)Spec.mat');
load('M(9)Spec.mat'); load('M(10)Spec.mat'); load('M(11)Spec.mat'); load('M(12)Spec.mat'); load('M(13)Spec.mat');
load('M(14)Spec.mat'); load('M(15)Spec.mat'); load('M(16)Spec.mat'); load('M(17)Spec.mat'); load('M(18)Spec.mat');
load('M(19)Spec.mat'); load('M(20)Spec.mat'); load('M(21)Spec.mat'); load('M(22)Spec.mat'); load('M(23)Spec.mat');
load('M(24)Spec.mat'); load('M(25)Spec.mat'); load('M(26)Spec.mat'); load('M(27)Spec.mat'); load('M(28)Spec.mat');
load('M(29)Spec.mat'); load('M(30)Spec.mat')
XM=zeros(30,562326);
XM(1,:)=MM1; XM(2,:)=MM2; XM(3,:)=MM3; XM(4,:)=MM4; XM(5,:)=MM5;
XM(6,:)=MM6; X(7,:)=MM7; XM(8,:)=MM8; XM(9,:)=MM9; XM(10,:)=MM10; XM(11,:)=MM11; XM(12,:)=MM12; XM(13,:)=MM13; XM(14,:)=MM14; XM(15,:)=MM15;
XM(16,:)=MM16; XM(17,:)=MM17; XM(18,:)=MM18; XM(19,:)=MM19; XM(20,:)=MM20; XM(21,:)=MM21; XM(22,:)=MM22; XM(23,:)=MM23; XM(24,:)=MM24; XM(25,:)=MM25;
XM(26,:)=MM26; XM(27,:)=MM27; XM(28,:)=MM28; XM(29,:)=MM29; XM(30,:)=MM30;
[m,n]=size(XM); % compute data size mn=mean(XM,2); % compute mean for each row XM=XM-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XM/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %
produce diagonal variances YM=u'*XM; % produce the principal components projection CyM=(1/(n-1))*(YM)*(YM.'); PCsM = diag(CyM); %SumM = sum(PCsM); %PCnormM = PCsM/SumM; save('MilesDavisTraining','PCsM');
t2 = 1:length(PCsM);
%subplot(3,1,3); %scatter(t2,PCsM); title('Principal Components (Miles Davis)');
xlabel('component #'); %ylabel('singular value');
Test 2 Training Sets
clear all; close all; clc;
load('G(1)Spec.mat'); load('G(10)Spec.mat'); load('G(11)Spec.mat');
load('G(12)Spec.mat'); load('G(13)Spec.mat'); load('G(14)Spec.mat'); load('G(15)Spec.mat'); load('G(16)Spec.mat');
load('G(17)Spec.mat'); load('G(18)Spec.mat'); load('G(19)Spec.mat'); load('G(2)Spec.mat'); load('G(20)Spec.mat');
load('G(21)Spec.mat'); load('G(22)Spec.mat'); load('G(23)Spec.mat'); load('G(24)Spec.mat'); load('G(25)Spec.mat');
load('G(26)Spec.mat'); load('G(27)Spec.mat'); load('G(28)Spec.mat'); load('G(29)Spec.mat'); load('G(3)Spec.mat');
load('G(30)Spec.mat'); load('G(4)Spec.mat'); load('G(5)Spec.mat'); load('G(6)Spec.mat'); load('G(7)Spec.mat');
load('G(8)Spec.mat'); load('G(9)Spec.mat')
XG=zeros(30,562326);
XG(1,:)=XX1; XG(2,:)=XX2; XG(3,:)=XX3; XG(4,:)=XX4; XG(5,:)=XX5;
XG(6,:)=XX6; XG(7,:)=XX7; XG(8,:)=XX8; XG(9,:)=XX9; XG(10,:)=XX10; XG(11,:)=XX11; XG(12,:)=XX12; XG(13,:)=XX13; XG(14,:)=XX14; XG(15,:)=XX15;
XG(16,:)=XX16; XG(17,:)=XX17; XG(18,:)=XX18; XG(19,:)=XX19; XG(20,:)=XX20; XG(21,:)=XX21; XG(22,:)=XX22; XG(23,:)=XX23; XG(24,:)=XX24; XG(25,:)=XX25;
XG(26,:)=XX26; XG(27,:)=XX27; XG(28,:)=XX28; XG(29,:)=XX29; XG(30,:)=XX30;
[m,n]=size(XG); % compute data size mn=mean(XG,2); % compute mean for each row XG=XG-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XG/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %
produce diagonal variances YG=u'*XG; % produce the principal components projection CyG=(1/(n-1))*(YG)*(YG.'); PCsG = diag(CyG); %SumG = sum(PCsG); %PCnormG = PCsG/SumG; save('GiraffageTraining','PCsG');
t2 = 1:length(PCsG);
%subplot(3,1,1);
%scatter(t2,PCsG); title('Principal Components (Giraffage)');
xlabel('component #'); %ylabel('singular value');
load('C(1)Spec.mat'); load('C(2)Spec.mat'); load('C(3)Spec.mat');
load('C(4)Spec.mat'); load('C(5)Spec.mat'); load('C(6)Spec.mat'); load('C(7)Spec.mat'); load('C(8)Spec.mat');
load('C(9)Spec.mat'); load('C(10)Spec.mat'); load('C(11)Spec.mat'); load('C(12)Spec.mat'); load('C(13)Spec.mat');
load('C(14)Spec.mat'); load('C(15)Spec.mat'); load('C(16)Spec.mat'); load('C(17)Spec.mat'); load('C(18)Spec.mat');
load('C(19)Spec.mat'); load('C(20)Spec.mat'); load('C(21)Spec.mat'); load('C(22)Spec.mat'); load('C(23)Spec.mat');
load('C(24)Spec.mat'); load('C(25)Spec.mat'); load('C(26)Spec.mat'); load('C(27)Spec.mat'); load('C(28)Spec.mat');
load('C(29)Spec.mat'); load('C(30)Spec.mat')
XC=zeros(30,562326);
XC(1,:)=CC1; XC(2,:)=CC2; XC(3,:)=CC3; XC(4,:)=CC4; XC(5,:)=CC5;
XC(6,:)=CC6; XC(7,:)=CC7; XC(8,:)=CC8; XC(9,:)=CC9; XC(10,:)=CC10; XC(11,:)=CC11; XC(12,:)=CC12; XC(13,:)=CC13; XC(14,:)=CC14; XC(15,:)=CC15;
XC(16,:)=CC16; XC(17,:)=CC17; XC(18,:)=CC18; XC(19,:)=CC19; XC(20,:)=CC20; XC(21,:)=CC21; XC(22,:)=CC22; XC(23,:)=CC23; XC(24,:)=CC24; XC(25,:)=CC25;
XC(26,:)=CC26; XC(27,:)=CC27; XC(28,:)=CC28; XC(29,:)=CC29; XC(30,:)=CC30;
[m,n]=size(XC); % compute data size mn=mean(XC,2); % compute mean for each row XC=XC-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XC/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %
produce diagonal variances YC=u'*XC; % produce the principal components projection CyC=(1/(n-1))*(YC)*(YC.'); PCsC = diag(CyC); %SumR = sum(PCsR); %PCnormR = PCsR/SumR; save('ComTruiseTraining','PCsC');
t2 = 1:length(PCsC);
%subplot(3,1,2); %scatter(t2,PCsC); title('Principal Components (ComTruise)');
xlabel('component #'); %ylabel('singular value');
load('S(1)Spec.mat'); load('S(2)Spec.mat'); load('S(3)Spec.mat');
load('S(4)Spec.mat'); load('S(5)Spec.mat'); load('S(6)Spec.mat'); load('S(7)Spec.mat'); load('S(8)Spec.mat');
load('S(9)Spec.mat'); load('S(10)Spec.mat'); load('S(11)Spec.mat'); load('S(12)Spec.mat'); load('S(13)Spec.mat');
load('S(14)Spec.mat'); load('S(15)Spec.mat'); load('S(16)Spec.mat'); load('S(17)Spec.mat'); load('S(18)Spec.mat');
load('S(19)Spec.mat'); load('S(20)Spec.mat'); load('S(21)Spec.mat'); load('S(22)Spec.mat'); load('S(23)Spec.mat');
load('S(24)Spec.mat'); load('S(25)Spec.mat'); load('S(26)Spec.mat'); load('S(27)Spec.mat'); load('S(28)Spec.mat');
load('S(29)Spec.mat'); load('S(30)Spec.mat')
XS=zeros(30,562326);
XS(1,:)=SS1; XS(2,:)=SS2; XS(3,:)=SS3; XS(4,:)=SS4; XS(5,:)=SS5;
XS(6,:)=SS6; X(7,:)=SS7; XS(8,:)=SS8; XS(9,:)=SS9; XS(10,:)=SS10; XS(11,:)=SS11; XS(12,:)=SS12; XS(13,:)=SS13; XS(14,:)=SS14; XS(15,:)=SS15;
XS(16,:)=SS16; XS(17,:)=SS17; XS(18,:)=SS18; XS(19,:)=SS19; XS(20,:)=SS20; XS(21,:)=SS21; XS(22,:)=SS22; XS(23,:)=SS23; XS(24,:)=SS24; XS(25,:)=SS25;
XS(26,:)=SS26; XS(27,:)=SS27; XS(28,:)=SS28; XS(29,:)=SS29; XS(30,:)=SS30;
[m,n]=size(XS); % compute data size mn=mean(XS,2); % compute mean for each row XS=XS-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XS/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %
produce diagonal variances YS=u'*XS; % produce the principal components projection CyS=(1/(n-1))*(YS)*(YS.'); PCsS = diag(CyS); %SumM = sum(PCsM); %PCnormM = PCsM/SumM; save('SkreamTraining','PCsS');
t2 = 1:length(PCsS);
%subplot(3,1,3);
%scatter(t2,PCsS); title('Principal Components (Skream)');
xlabel('component #'); %ylabel('singular value');
Test 3 Training Sets
clear all; close all; clc;
clear all; close all; clc;
load('E(1)Spec.mat'); load('E(10)Spec.mat'); load('E(11)Spec.mat');
load('E(12)Spec.mat'); load('E(13)Spec.mat'); load('E(14)Spec.mat'); load('E(15)Spec.mat'); load('E(16)Spec.mat');
load('E(17)Spec.mat'); load('E(18)Spec.mat'); load('E(19)Spec.mat'); load('E(2)Spec.mat'); load('E(20)Spec.mat');
load('E(21)Spec.mat'); load('E(22)Spec.mat'); load('E(23)Spec.mat'); load('E(24)Spec.mat'); load('E(25)Spec.mat');
load('E(26)Spec.mat'); load('E(27)Spec.mat'); load('E(28)Spec.mat'); load('E(29)Spec.mat'); load('E(3)Spec.mat');
load('E(30)Spec.mat'); load('E(4)Spec.mat'); load('E(5)Spec.mat'); load('E(6)Spec.mat'); load('E(7)Spec.mat');
load('E(8)Spec.mat'); load('E(9)Spec.mat')
XE=zeros(30,562326);
XE(1,:)=EE1; XE(2,:)=EE2; XE(3,:)=EE3; XE(4,:)=EE4; XE(5,:)=EE5;
XE(6,:)=EE6; XE(7,:)=EE7; XE(8,:)=EE8; XE(9,:)=EE9; XE(10,:)=EE10; XE(11,:)=EE11; XE(12,:)=EE12; XE(13,:)=EE13; XE(14,:)=EE14; XE(15,:)=EE15;
XE(16,:)=EE16; XE(17,:)=EE17; XE(18,:)=EE18; XE(19,:)=EE19; XE(20,:)=EE20; XE(21,:)=EE21; XE(22,:)=EE22; XE(23,:)=EE23; XE(24,:)=EE24; XE(25,:)=EE25;
XE(26,:)=EE26; XE(27,:)=EE27; XE(28,:)=EE28; XE(29,:)=EE29; XE(30,:)=EE30;
[m,n]=size(XE); % compute data size mn=mean(XE,2); % compute mean for each row XE=XE-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XE/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %
produce diagonal variances YE=u'*XE; % produce the principal components projection CyE=(1/(n-1))*(YE)*(YE.'); PCsE = diag(CyE); %SumG = sum(PCsG); %PCnormG = PCsG/SumG; save('ElectronicsTraining','PCsE');
t2 = 1:length(PCsE);
%subplot(3,1,1);
%scatter(t2,PCsE); title('Principal Components (Electronics)');
xlabel('component #'); %ylabel('singular value');
load('T(1)Spec.mat'); load('T(2)Spec.mat'); load('T(3)Spec.mat');
load('T(4)Spec.mat'); load('T(5)Spec.mat'); load('T(6)Spec.mat'); load('T(7)Spec.mat'); load('T(8)Spec.mat');
load('T(9)Spec.mat'); load('T(10)Spec.mat'); load('T(11)Spec.mat'); load('T(12)Spec.mat'); load('T(13)Spec.mat');
load('T(14)Spec.mat'); load('T(15)Spec.mat');
load('T(16)Spec.mat'); load('T(17)Spec.mat'); load('T(18)Spec.mat');
load('T(19)Spec.mat'); load('T(20)Spec.mat'); load('T(21)Spec.mat'); load('T(22)Spec.mat'); load('T(23)Spec.mat');
load('T(24)Spec.mat'); load('T(25)Spec.mat'); load('T(26)Spec.mat'); load('T(27)Spec.mat'); load('T(28)Spec.mat');
load('T(29)Spec.mat'); load('T(30)Spec.mat')
XT=zeros(30,562326);
XT(1,:)=TT1; XT(2,:)=TT2; XT(3,:)=TT3; XT(4,:)=TT4; XT(5,:)=TT5;
XT(6,:)=TT6; XT(7,:)=TT7; XT(8,:)=TT8; XT(9,:)=TT9; XT(10,:)=TT10; XT(11,:)=TT11; XT(12,:)=TT12; XT(13,:)=TT13; XT(14,:)=TT14; XT(15,:)=TT15;
XT(16,:)=TT16; XT(17,:)=TT17; XT(18,:)=TT18; XT(19,:)=TT19; XT(20,:)=TT20; XT(21,:)=TT21; XT(22,:)=TT22; XT(23,:)=TT23; XT(24,:)=TT24; XT(25,:)=TT25;
XT(26,:)=TT26; XT(27,:)=TT27; XT(28,:)=TT28; XT(29,:)=TT29; XT(30,:)=TT30;
[m,n]=size(XT); % compute data size mn=mean(XT,2); % compute mean for each row XT=XT-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XT/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %
produce diagonal variances YT=u'*XT; % produce the principal components projection CyT=(1/(n-1))*(YT)*(YT.'); PCsT = diag(CyT); %SumR = sum(PCsR); %PCnormR = PCsR/SumR; save('RockTraining','PCsT');
t2 = 1:length(PCsT);
%subplot(3,1,2); %scatter(t2,PCsT); title('Principal Components (Rock)'); xlabel('component
#'); %ylabel('singular value');
load('J(1)Spec.mat'); load('J(2)Spec.mat'); load('J(3)Spec.mat');
load('J(4)Spec.mat'); load('J(5)Spec.mat'); load('J(6)Spec.mat'); load('J(7)Spec.mat'); load('J(8)Spec.mat');
load('J(9)Spec.mat'); load('J(10)Spec.mat'); load('J(11)Spec.mat'); load('J(12)Spec.mat'); load('J(13)Spec.mat');
load('J(14)Spec.mat'); load('J(15)Spec.mat'); load('J(16)Spec.mat'); load('J(17)Spec.mat'); load('J(18)Spec.mat');
load('J(19)Spec.mat'); load('J(20)Spec.mat'); load('J(21)Spec.mat'); load('J(22)Spec.mat'); load('J(23)Spec.mat');
load('J(24)Spec.mat'); load('J(25)Spec.mat'); load('J(26)Spec.mat'); load('J(27)Spec.mat'); load('J(28)Spec.mat');
load('J(29)Spec.mat'); load('J(30)Spec.mat')
XJ=zeros(30,562326);
XJ(1,:)=JJ1; XJ(2,:)=JJ2; XJ(3,:)=JJ3; XJ(4,:)=JJ4; XJ(5,:)=JJ5;
XJ(6,:)=JJ6; XJ(7,:)=JJ7; XJ(8,:)=JJ8; XJ(9,:)=JJ9; XJ(10,:)=JJ10; XJ(11,:)=JJ11; XJ(12,:)=JJ12; XJ(13,:)=JJ13; XJ(14,:)=JJ14; XJ(15,:)=JJ15;
XJ(16,:)=JJ16; XJ(17,:)=JJ17; XJ(18,:)=JJ18; XJ(19,:)=JJ19; XJ(20,:)=JJ20; XJ(21,:)=JJ21; XJ(22,:)=JJ22; XJ(23,:)=JJ23; XJ(24,:)=JJ24; XJ(25,:)=JJ25;
XJ(26,:)=JJ26; XJ(27,:)=JJ27; XJ(28,:)=JJ28; XJ(29,:)=JJ29; XJ(30,:)=JJ30;
[m,n]=size(XJ); % compute data size mn=mean(XJ,2); % compute mean for each row XJ=XJ-repmat(mn,1,n); % subtract mean [u,s,v]=svd(XJ/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %
produce diagonal variances YJ=u'*XJ; % produce the principal components projection CyJ=(1/(n-1))*(YJ)*(YJ.'); PCsJ = diag(CyJ); %SumM = sum(PCsM); %PCnormM = PCsM/SumM; save('JazzTraining','PCsJ');
t2 = 1:length(PCsJ);
%subplot(3,1,3); scatter(t2,PCsJ); title('Principal Components (Jazz)'); xlabel('component
#'); ylabel('singular value');
Genre Recognition for Test 1
clear all; close all; clc;
% Load training sets load('GiraffageTraining.mat'); load('MilesDavisTraining.mat');
load('RadioheadTraining.mat');
Giraffage = PCsG(2:20,:).'; Radiohead = PCsR(2:20,:).'; MilesDavis = PCsM(2:20,:).';
% Load music to be identified with PCA algorithm
L=5; n=220501; t2=linspace(0,L,n+1); t=t2(1:n); k=(2*pi/L)*[0:n/2-1 -n/2:-1]; ks=fftshift(k);
audioread M(30).wav; vvv1 = ans(:,1); vvv2 = ans(:,2); y = (vvv1 + vvv2)/2; Fs = 44100;
vvv = y'/2; start=length(y)/2; finish=(length(y)/2)+(5*(Fs)); vv = vvv(1,start:finish); Sgt_spec=[]; tslide=0:0.1:5;
for j=1:length(tslide) g=exp(-20*(t-tslide(j)).^2); % Gaussian Sg=g.*vv; Sgt=fft(Sg); Sgt=Sgt(1,n/2:n); Sgt_spec=[Sgt_spec; resample(abs(fftshift(Sgt)),1,10)]; %subplot(3,1,1), plot(t,vv,'k',t,g,'r') %subplot(3,1,2), plot(t,Sg,'k') %subplot(3,1,3), plot(ks(1,n/2:n),abs(fftshift(Sgt))/max(abs(Sgt)))
%drawnow end
X=Sgt_spec;
[m,n]=size(X); % compute data size mn=mean(X,2); % compute mean for each row X=X-repmat(mn,1,n); % subtract mean [u,s,v]=svd(X/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %
produce diagonal variances Y=u'*X; % produce the principal components projection Cy=(1/(n-1))*(Y)*(Y.'); PCs = diag(Cy); %Sum = sum(PCs); %PCnorm = PCs/Sum;
% Sample Testmusic = PCs(2:20,:).';
% Training Classifier = zeros(3,19); Classifier(1,:) = Giraffage; Classifier(2,:) = Radiohead; Classifier(3,:) = MilesDavis;
% Group Group = {'Giraffage'; 'Radiohead'; 'MilesDavis'};
% Function Genre = knnclassify(Testmusic, Classifier, Group);
% Display result disp('result:'); disp(Genre);
Genre Recognition for Test 2
clear all; close all; clc;
% Load training sets load('GiraffageTraining.mat'); load('ComTruiseTraining.mat');
load('SkreamTraining.mat');
Giraffage = PCsG(2:20,:).'; ComTruise = PCsC(2:20,:).'; Skream = PCsS(2:20,:).';
% Load music to be identified with PCA algorithm
L=5; n=220501; t2=linspace(0,L,n+1); t=t2(1:n); k=(2*pi/L)*[0:n/2-1 -n/2:-1]; ks=fftshift(k);
audioread S(30).wav; vvv1 = ans(:,1);
vvv2 = ans(:,2); y = (vvv1 + vvv2)/2; Fs = 44100;
vvv = y'/2; start=length(y)/2; finish=(length(y)/2)+(5*(Fs)); vv = vvv(1,start:finish); Sgt_spec=[]; tslide=0:0.1:5;
for j=1:length(tslide) g=exp(-20*(t-tslide(j)).^2); % Gaussian Sg=g.*vv; Sgt=fft(Sg); Sgt=Sgt(1,n/2:n); Sgt_spec=[Sgt_spec; resample(abs(fftshift(Sgt)),1,10)]; %subplot(3,1,1), plot(t,vv,'k',t,g,'r') %subplot(3,1,2), plot(t,Sg,'k') %subplot(3,1,3), plot(ks(1,n/2:n),abs(fftshift(Sgt))/max(abs(Sgt))) %drawnow end
X=Sgt_spec;
[m,n]=size(X); % compute data size mn=mean(X,2); % compute mean for each row X=X-repmat(mn,1,n); % subtract mean [u,s,v]=svd(X/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %
produce diagonal variances Y=u'*X; % produce the principal components projection Cy=(1/(n-1))*(Y)*(Y.'); PCs = diag(Cy); %Sum = sum(PCs); %PCnorm = PCs/Sum;
% Sample Testmusic = PCs(2:20,:).';
% Training Classifier = zeros(3,19); Classifier(1,:) = Giraffage; Classifier(2,:) = ComTruise; Classifier(3,:) = Skream;
% Group Group = {'Giraffage'; 'ComTruise'; 'Skream'};
% Function Genre = knnclassify(Testmusic, Classifier, Group);
% Display result disp('result:'); disp(Genre);
Genre Recognition for Test 3
clear all; close all; clc;
% Load training sets load('ElectronicsTraining.mat'); load('RockTraining.mat');
load('JazzTraining.mat');
Electronics = PCsE(2:20,:).'; Rock = PCsT(2:20,:).'; Jazz = PCsJ(2:20,:).';
% Load music to be identified with PCA algorithm
L=5; n=220501; t2=linspace(0,L,n+1); t=t2(1:n); k=(2*pi/L)*[0:n/2-1 -n/2:-1]; ks=fftshift(k);
audioread J(30).wav; vvv1 = ans(:,1); vvv2 = ans(:,2); y = (vvv1 + vvv2)/2; Fs = 44100;
vvv = y'/2; start=length(y)/2; finish=(length(y)/2)+(5*(Fs)); vv = vvv(1,start:finish); Sgt_spec=[]; tslide=0:0.1:5;
for j=1:length(tslide) g=exp(-20*(t-tslide(j)).^2); % Gaussian Sg=g.*vv; Sgt=fft(Sg); Sgt=Sgt(1,n/2:n); Sgt_spec=[Sgt_spec; resample(abs(fftshift(Sgt)),1,10)]; %subplot(3,1,1), plot(t,vv,'k',t,g,'r') %subplot(3,1,2), plot(t,Sg,'k') %subplot(3,1,3), plot(ks(1,n/2:n),abs(fftshift(Sgt))/max(abs(Sgt))) %drawnow end
X=Sgt_spec;
[m,n]=size(X); % compute data size mn=mean(X,2); % compute mean for each row X=X-repmat(mn,1,n); % subtract mean [u,s,v]=svd(X/sqrt(n-1),'econ'); % perform the SVDlambda=diag(s).^2; %
produce diagonal variances Y=u'*X; % produce the principal components projection Cy=(1/(n-1))*(Y)*(Y.'); PCs = diag(Cy); %Sum = sum(PCs); %PCnorm = PCs/Sum;
% Sample Testmusic = PCs(2:20,:).';
% Training Classifier = zeros(3,19);
Classifier(1,:) = Electronics; Classifier(2,:) = Rock; Classifier(3,:) = Jazz;
% Group Group = {'Electronics'; 'Rock'; 'Jazz'};
% Function Genre = knnclassify(Testmusic, Classifier, Group);
% Display result disp('result:'); disp(Genre);
Appendix C
In this section, artists and their albums that were used for algorithm development are
presented. They are all awesome artists. You should definitely check them out.
Test 1
Giraffage
- Needs
- Comfort
- No Reason
- Janet Jackson: Someone to call my lover Giraffage Remix
- Miley Cyrus: Party in the USA Giraffage Remix
- R Kelly: Ignition Giraffage Remix
- Stardust: Music sounds better with you Giraffage Remix
- The Dream: Shawty is da shit Giraffage Remix
- And 20+ more remix songs from Giraffage Soundcloud
Radiohead
- The Bends
- Ok Computer
- Kid A
- Amnesiac
- Hail to the Thief
Miles Davis
- Bitches Brew Legacy Edition
- Birth of Cool
- Kind of Blue
- In a Silent Way
- Quiet Nights
Test 2
Giraffage
- Same as Test 1
Com Truise
- In Decay
- Galactic Melt
- Wave 1
- Cyanide Sister EP
- Fairlight
Skream
- Skream!
- Skreamizm Vol 1
- Skreamizm Vol 2
- Skreamizm Vol 3
- Skreamizm Vol 4
- Skreamizm Vol 5
- Skreamizm Vol 6
Test 3
Giraffage
- Same as Test 1
Com Truise
- Same as Test 2
Skream
- Same as Test 2
Meshuggah
- Obzen
- Koloss
- Nothing
Animals as Leaders
- Animals as Leaders
- Weightless
- The Joy of Motion
Miles Davis
- Same as Test 2
John Coltrane
- Duke Elington & John Coltrane
- Love Supreme
- Blue Train