machine learning for music
TRANSCRIPT
![Page 1: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/1.jpg)
Machine Learning for Music
Faculty of Mathematics and Informatics, SUPetko Nikolov April 8, 2015
![Page 2: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/2.jpg)
About Me
Machine Learning
Music Information Retrieval
Machine Learning / Automated Data Science
![Page 3: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/3.jpg)
What’s Music Information Retrieval?
Musicology
Computer Science
Signal Processing
Machine Learning
MIR
![Page 4: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/4.jpg)
![Page 5: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/5.jpg)
![Page 6: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/6.jpg)
Music Recommendations
![Page 7: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/7.jpg)
Recommending tags
![Page 8: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/8.jpg)
Spotify’s Shuffle Mode
● Not really random
● Certainly some processing
● Probably some MIR behind
![Page 9: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/9.jpg)
Pandora’s Music Genome Project
● started in 2000
● 800 000 manually annotated tracks by music experts
● 450 attributes to describe music
● 25 minutes per track to label
![Page 10: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/10.jpg)
MIREX
Music Information Retrieval Evaluation eXchange annual competition featuring more than 20 tasks
state-of-the-art algorithms compete against each other
![Page 11: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/11.jpg)
Structured Information
Retrieval
Synthesis
![Page 12: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/12.jpg)
fingerprintingcover song detectiongenre recognitioninstrument recognitionmood detectiontranscriptionplaylist generation
beat trackingkey detectionpitch trackingvocal detectionrecommendationaudio similaritysource separation
![Page 13: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/13.jpg)
genre recognitioninstrument recognitionmood detection
vocal detection
audio similarity
![Page 14: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/14.jpg)
MIR Architecture
Audio
![Page 15: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/15.jpg)
Segmentation and
Preprocessing
MIR Architecture
Audio
![Page 16: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/16.jpg)
Segmentation and
Preprocessing
Feature Extraction
MIR Architecture
Audio
![Page 17: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/17.jpg)
Segmentation and
Preprocessing
Feature Extraction
Machine Learning
MIR Architecture
Audio
![Page 18: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/18.jpg)
Segmentation and
Preprocessing
Feature Extraction
Machine Learning
classical
piano
romanticBethoven
by Daniel Barenboim
2 4
MIR Architecture
Audio
![Page 19: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/19.jpg)
Segmentation and
Preprocessing
classical
piano
romanticBethoven
Deep Learning
by Daniel Barenboim
2 4
MIR Architecture
Audio
![Page 20: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/20.jpg)
Audio signal
![Page 21: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/21.jpg)
Audio signal
human hearing: 20 Hz to 20 KHz
![Page 22: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/22.jpg)
Segmentation
![Page 23: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/23.jpg)
SegmentationFrame
![Page 24: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/24.jpg)
SegmentationFrame
52 ms
![Page 25: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/25.jpg)
SegmentationFrame
52 msf1
![Page 26: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/26.jpg)
SegmentationFrame
52 msf1 f2
![Page 27: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/27.jpg)
SegmentationFrame
52 msf1 f2 f3
![Page 28: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/28.jpg)
SegmentationFrame
52 msf1 f2 f3 f4
![Page 29: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/29.jpg)
SegmentationFrame
52 msf1 f2 f3 f4 fn
![Page 30: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/30.jpg)
Spectrum - on frame level
Discrete Fourier Transform (DFT)
time frequency
![Page 31: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/31.jpg)
Feature extraction
f x
![Page 32: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/32.jpg)
Spectral Centroid
where is the ‘center of mass’ of the spectrum
![Page 33: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/33.jpg)
Spectral Slope
fit linear regression and get the slope coef.
![Page 34: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/34.jpg)
Spectral Slope
fit linear regression and get the slope coef.
![Page 35: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/35.jpg)
Spectral Slope
fit linear regression and get the slope coef.
![Page 36: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/36.jpg)
Spectral Slope
fit linear regression and get the slope coef.
![Page 37: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/37.jpg)
Spectral Correlation is the cosine distance between the frequency vectors of two consecutive framesVariation is (1.0 - correlation) respectively.
Spectral Correlation / Variation
![Page 38: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/38.jpg)
Feature extraction - Result
f11 f12 f13 f14 f15 ……… f1m
f21 f22 f23 f24 f25 ……… f2m
centroid
correlation
Frames
![Page 39: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/39.jpg)
Feature extraction - Result
f11 f12 f13 f14 f15 ……… f1m
f21 f22 f23 f24 f25 ……… f2m
centroid
correlation
Framesframes number vary across audio recordings
![Page 40: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/40.jpg)
Universal Background Model
![Page 41: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/41.jpg)
Gaussian Mixture Model
frame feature vector
![Page 42: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/42.jpg)
Gaussian Mixture Model
Multivariate Gaussian Distribution
![Page 43: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/43.jpg)
Gaussian Mixture Model
![Page 44: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/44.jpg)
Gaussian Mixture Model
![Page 45: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/45.jpg)
Gaussian Mixture Model - per track
![Page 46: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/46.jpg)
Gaussian Mixture Model - per track
![Page 47: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/47.jpg)
Gaussian Mixture Model - per track
![Page 48: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/48.jpg)
Gaussian Mixture Model - per track
[𝛍1,𝛍2,𝛍3,𝛍4]
![Page 49: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/49.jpg)
Classification - Example Neural Netaik
wk
Feature vector
Input Hidden Output
Likelihood of Rock?
Layers:
![Page 50: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/50.jpg)
Classification - Example Neural Netaik
wk
Feature vector
Input Hidden Output
Likelihood of Rock?
Layers:
![Page 51: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/51.jpg)
Classification - Example Neural Netaik
wk
Feature vector
Input Hidden Output
Likelihood of Rock?
Layers:
![Page 52: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/52.jpg)
What’s Deep Learning?
(defn deep-learning? [neural-net] (hidden-layer? neural-net))
we are trying to learn new high-level representation having many more hidden layers
input is as raw as possible
![Page 53: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/53.jpg)
Mel-spectrum
![Page 54: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/54.jpg)
Deep Neural Network
![Page 55: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/55.jpg)
Deep Neural Network
Backpropagation
![Page 56: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/56.jpg)
Deep Neural Network
Backpropagation
![Page 57: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/57.jpg)
Deep Neural Network
Backpropagation gradient fades quickly
![Page 58: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/58.jpg)
Deep Belief Network
Input (Mel spectrum)
Output
Hidden Layer 3
Hidden Layer 2
Hidden Layer 1Restricted Boltzmann Machine
RBM
RBM
RBM
Rock Jazz Punk Electronic
![Page 59: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/59.jpg)
Deep Belief Network
Input (Mel spectrum)
Hidden Layer 1Restricted Boltzmann Machine
![Page 60: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/60.jpg)
Deep Belief Network
Input (Mel spectrum)
Hidden Layer 1Restricted Boltzmann Machine
![Page 61: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/61.jpg)
Deep Belief Network
Input (Mel spectrum)
Output
Hidden Layer 3
Hidden Layer 2
Hidden Layer 1Restricted Boltzmann Machine
RBM
RBM
RBM
Rock Jazz Punk Electronic
![Page 62: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/62.jpg)
Deep Auto Encoders
Mel spectrum
Mel spectrumOutput
Input
![Page 63: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/63.jpg)
Deep Auto Encoders
Mel spectrum
Mel spectrumOutput
Input
Used for denoising
![Page 64: Machine learning for Music](https://reader034.vdocuments.net/reader034/viewer/2022042701/55a8df1e1a28ab1d0d8b47fb/html5/thumbnails/64.jpg)
Tools
essentia - audio retrieval algorithms
theano - CPU/GPU symbolic optimization
scikit-learn - machine learning in Python