2013 muellermeinard lecturemusicprocessing audiofeatures.ppt

15
Music Processing Meinard Müller Lecture Audio Features International Audio Laboratories Erlangen [email protected] Fourier Transform Sinusoidals Time (seconds) Time (seconds) Idea: Decompose a given signal into a superposition of sinusoidals (elementary signals). Signal Each sinusoidal has a physical meaning and can be described by three parameters: Fourier Transform Sinusoidals Time (seconds) Each sinusoidal has a physical meaning and can be described by three parameters: Fourier Transform Sinusoidals Time (seconds) Time (seconds) Signal Each sinusoidal has a physical meaning and can be described by three parameters: Fourier Transform Frequency (Hz) Fouier transform 1 2 3 4 5 6 7 8 1 0.5 ˄ Time (seconds) Signal | | Fourier Transform Frequency (Hz) Time (seconds) Example: Superposition of two sinusoidals

Upload: others

Post on 24-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

Music Processing

Meinard Müller

Lecture

Audio Features

International Audio Laboratories [email protected]

Fourier Transform

Sinusoidals

Time (seconds)

Time (seconds)

Idea: Decompose a given signal into a superpositionof sinusoidals (elementary signals).

Signal

Each sinusoidal has a physical meaning and can be described by three parameters:

Fourier Transform

Sinusoidals

Time (seconds)

Each sinusoidal has a physical meaning and can be described by three parameters:

Fourier Transform

Sinusoidals

Time (seconds)

Time (seconds)

Signal

Each sinusoidal has a physical meaning and can be described by three parameters:

Fourier Transform

Frequency (Hz)

Fouier transform

1 2 3 4 5 6 7 8

1

0.5

˄

Time (seconds)

Signal | |

Fourier Transform

Frequency (Hz)

Time (seconds)

Example: Superposition of two sinusoidals

Page 2: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

Fourier Transform

Frequency (Hz)

Time (seconds)

Example: C4 played by piano

Fourier Transform

Frequency (Hz)

Time (seconds)

Example: C4 played by trumpet

Fourier Transform

Frequency (Hz)

Time (seconds)

Example: C4 played by violine

Fourier Transform

Frequency (Hz)

Example: C4 played by flute

Time (seconds)

Fourier Transform

Frequency (Hz)

Example: Speech “Bonn”

Time (seconds)

Fourier Transform

Frequency (Hz)

Example: Speech “Zürich”

Time (seconds)

Page 3: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

Fourier Transform

Frequency (Hz)

Example: C-major scale (piano)

Time (seconds)

Fourier Transform

Frequency (Hz)

Time (seconds)

Example: Chirp signal

Each sinusoidal has a physical meaning and can be described by three parameters:

Fourier Transform

Complex formulation of sinusoidals:

Polar coordinates:

Re

Im

Fourier Transform

Signal

Fourier representation

Fourier transform

,

Fourier Transform

Tells which frequencies occur, but does not tell when the frequencies occur.

Frequency information is averaged over the entire time interval.

Time information is hidden in the phase

Signal

,Fourier representation

Fourier transform

Fourier Transform

Time (seconds)

Frequency (Hz)

Page 4: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

Idea (Dennis Gabor, 1946):

Consider only a small section of the signal for the spectral analysis

→ recovery of time information

Short Time Fourier Transform (STFT)

Section is determined by pointwise multiplication of the signal with a localizing window function

Short Time Fourier Transform

Time (seconds)

Frequency (Hz)

Short Time Fourier Transform

Time (seconds)Time (seconds)

Frequency (Hz)

Short Time Fourier Transform

Time (seconds) Time (seconds)

Frequency (Hz)

Short Time Fourier Transform

Time (seconds) Time (seconds)

Frequency (Hz)

Short Time Fourier Transform

Frequency (Hz)

Time (seconds) Time (seconds)

Frequency (Hz)

Short Time Fourier Transform

Page 5: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

Time (seconds) Time (seconds)

Frequency (Hz) Frequency (Hz)

Short Time Fourier Transform

Time (seconds) Time (seconds)

Frequency (Hz) Frequency (Hz)

Short Time Fourier Transform

Definition

Signal

Window function ( , )

STFT

with

Short Time Fourier Transform Short Time Fourier Transform

Intuition:

is “musical note” of frequency , whichoscillates within the translated window

Innere product measures the correlation between the musical note and the signal .

Intuition:

is “musical note” of frequency , whichoscillates within the translated window

Short Time Fourier Transform Window Function

Box window

Time (seconds) Frequency (Hz)

Page 6: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

Triangle window

Window Function

Time (seconds) Frequency (Hz)

Hann window

Window Function

Time (seconds) Frequency (Hz)

Trade off between smoothing and „ringing“

Time (seconds) Time (seconds)

Frequency (Hz) Frequency (Hz)

Window Function

| |

Time-Frequency Representation

Time (seconds)

Frequency (Hertz)

Intensity (dB)

| |

Time-Frequency Representation

Time (seconds)

Frequency (Hertz)

Intensity (dB)

Spectrogram

Chirp signal and STFT with box window of length 0.05

Time-Frequency Representation

Freq

uenc

y (H

z)

Time (seconds)

Page 7: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

Chirp signal and STFT with Hann window of length 0.05

Time-Frequency Representation

Freq

uenc

y (H

z)

Time (seconds)

Size of window constitutes a trade-off between time resolution and frequency resolution:

Large window : poor time resolutiongood frequency resolution

Small window : good time resolutionpoor frequency resolution

Heisenberg Uncertainty Principle: there is no window function that localizes in time and frequency with arbitrary position.

Time-Frequency Localization

Short Time Fourier Transform

Signal and STFT with Hann window of length 0.02

Freq

uenc

y (H

z)

Time (seconds)

Short Time Fourier Transform

Signal and STFT with Hann window of length 0.1

Freq

uenc

y (H

z)

Time (seconds)

MATLAB

MATLAB function SPECTROGRAM N = window length (in samples) M = overlap (usually ) Compute DFTN for every windowed section Keep lower Fourier coefficients

→ Sequence of spectral vectors (for each window a vector of dimension )

Let x be a discrete time signalSampling rate: Window length: Overlap: Hopsize:

Let

corresponds to window

Example

Page 8: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

Time resolution:

Frequency resolution:

Example

Hz

Pitch Features

Model assumption: Equal-tempered scale MIDI pitches: Piano notes: Concert pitch: Center frequency:

→ Logarithmic frequency distributionOctave: doubling of frequency

Hz

Pitch Features

Idea: Binning of Fourier coefficients

Divide up the fequency axis into logarithmically spaced „pitch regions“ and combine spectral coefficients of each region to a single pitch coefficient.

Pitch FeaturesTime-frequency representation

Windowing in the time domain

Window

ing in the frequency domain

Pitch FeaturesDetails: Let be a spectral vector obtained from a

spectrogram w.r.t. a sampling rate and a window length N. The spectral coefficient corresponds to the frequency

Let

be the set of coefficients assigned to a pitchThen the pitch coefficient is defined as

Pitch Features

Example: A4, p = 69 Center frequency: Lower bound: Upper bound: STFT with ,

Page 9: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

Pitch Features

Example: A4, p = 69 Center frequency: Lower bound: Upper bound: STFT with ,

S(p = 69)

Pitch Features

Note MIDI pitch

Center [Hz] frequency

Left [Hz] boundary

Right [Hz] boundary

Width [Hz]

A3 57 220.0 213.7 226.4 12.7A#3 58 233.1 226.4 239.9 13.5B3 59 246.9 239.9 254.2 14.3C4 60 261.6 254.2 269.3 15.1C#4 61 277.2 269.3 285.3 16.0D4 62 293.7 285.3 302.3 17.0D#4 63 311.1 302.3 320.2 18.0E4 64 329.6 320.2 339.3 19.0F4 65 349.2 339.3 359.5 20.2F#4 66 370.0 359.5 380.8 21.4G4 67 392.0 380.8 403.5 22.6G#4 68 415.3 403.5 427.5 24.0A4 69 440.0 427.5 452.9 25.4

Pitch Features

Note:

For some pitches, S(p) may be empty. This particularly holds for low notes corresponding to narrow frequency bands.

→ Linear frequency sampling is problematic!

Solution:Multi-resolution spectrograms or multirate filterbanks

Pitch Features

0 1 2 3 4

Time (seconds)

Example: Friedrich Burgmüller, Op. 100, No. 2

Pitch Features

Time (seconds)

Inte

nsity

Freq

uenc

y (H

z)

Spectrogram

Pitch Features

Time (seconds)

C4: 261 HzC5: 523 Hz

C6: 1046 Hz

C7: 2093 Hz

C8: 4186 Hz

Inte

nsity

Spectrogram

Page 10: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

Pitch Features

Inte

nsity

(dB

)

Time (seconds)

C4

C5

C6

C7

C8

Pitch representation

Pitch Features

MID

I pitc

h

Inte

nsity

(dB

)

Time (seconds)

C4

C5

C6

C7

C8

Pitch representation

Pitch Features

Example: Chromatic scale

Time (seconds)

Freq

uenc

y (H

z)

Inte

nsity

(dB

)

Spectrogram

Pitch Features

Example: Chromatic scale

Time (seconds)

Inte

nsity

(dB

)

Spectrogram

C4: 261 HzC5: 523 Hz

C6: 1046 Hz

C7: 2093 Hz

C8: 4186 Hz

C3: 131 Hz

Pitch Features

Example: Chromatic scale

Inte

nsity

(dB

)

Log-frequency spectrogram

Time (seconds)

C4: 261 Hz

C5: 523 Hz

C6: 1046 Hz

C7: 2093 Hz

C8: 4186 Hz

C3: 131 Hz

Pitch Features

Example: Chromatic scale

Pitc

h (M

IDI n

ote

num

ber)

Inte

nsity

(dB

)

Log-frequency spectrogram

Time (seconds)

Page 11: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

Pitch Features

Example: Chromatic scale

Pitc

h (M

IDI n

ote

num

ber)

Inte

nsity

(dB

)

Log-frequency spectrogram

Time (seconds)Chroma C

Pitch Features

Example: Chromatic scale

Pitc

h (M

IDI n

ote

num

ber)

Inte

nsity

(dB

)

Log-frequency spectrogram

Time (seconds)Chroma C#

Chroma Features

Example: Chromatic scale

Chr

oma

Inte

nsity

(dB

)

Chroma representation

Time (seconds)

Chroma Features

Example: Chromatic scale

Inte

nsity

(nor

mal

ized

)

Chroma representation (normalized, Euclidean)

Chr

oma

Time (seconds)

Chroma Features

Human perception of pitch is periodic in the sense that two pitches are perceived as similar in color if they differ by an octave.

Seperation of pitch into two components: tone height (octave number) and chroma.

Chroma : 12 traditional pitch classes of the equal-tempered scale. For example:Chroma C

Computation: pitch features chroma featuresAdd up all pitches belonging to the same class

Result: 12-dimensional chroma vector.

Chroma Features

Page 12: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

C2 C3 C4

Chroma C

Chroma Features

C#2 C#3 C#4

Chroma C#

Chroma Features

D2 D3 D4

Chroma D

Chroma Features Chroma Features

Shepard‘s helix of pitch perceptionChromatic circle

http://en.wikipedia.org/wiki/Pitch_class_space Bartsch/Wakefield, IEEE Trans. Multimedia, 2005

Chroma FeaturesExample: C-Major Scale

Chroma Features

MID

I pitc

h

Inte

nsity

(dB

)

Time (seconds)

C4

C5

C6

C7

C8

Pitch representation

Page 13: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

Chroma Features

Chroma representation

Inte

nsity

(dB

)

Chr

oma

Time (seconds)

Chroma Features

Chroma representation (normalized)

Inte

nsity

(nor

mal

ized

)

Chr

oma

Time (seconds)

Chroma Features

Time (seconds) Time (seconds)Time (seconds) Time (seconds)

Example: Beethoven’s Fifth

Karajan Scherbakov

Chroma representation (normalized, 10 Hz)

Chroma Features

Time (seconds) Time (seconds)Time (seconds) Time (seconds)

Example: Beethoven’s Fifth

Karajan Scherbakov

Smoothing (2 seconds) + downsampling (factor 5)Chroma representation (normalized, 2 Hz)

Chroma Features

Time (seconds) Time (seconds)Time (seconds) Time (seconds)

Example: Beethoven’s Fifth

Karajan Scherbakov

Smoothing (4 seconds) + downsampling (factor 10)Chroma representation (normalized, 1 Hz)

Chroma FeaturesExample: Bach Toccata

Koopman Ruebsam

Time (sampels) Time (sampels)

Page 14: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

Chroma Features

Koopman Ruebsam

Feature resolution: 10 Hz

Example: Bach Toccata

Time (sampels) Time (sampels)

Chroma Features

Koopman Ruebsam

Feature resolution: 1 Hz

Example: Bach Toccata

Time (sampels) Time (sampels)

Chroma Features

Koopman Ruebsam

Feature resolution: 0.33 Hz

Example: Bach Toccata

Time (sampels) Time (sampels)

Sequence of chroma vectors correlates to the harmonic progression

Normalization makes features invariant to

changes in dynamics

Further quantization and smoothing: CENS features

Taking logarithm before adding up pitch coefficients accounts for logarithmic sensation of intensity

Chroma Features

Chroma FeaturesExample: Zager & Evans “In The Year 2525”

How to deal with transpositions?

Chroma FeaturesExample: Zager & Evans “In The Year 2525”

Original:

Page 15: 2013 MuellerMeinard LectureMusicProcessing AudioFeatures.ppt

Chroma FeaturesExample: Zager & Evans “In The Year 2525”

Original: Shifted:

Audio Features

There are many ways to implement chroma features Properties may differ significantly Appropriateness depends on respective application

http://www.mpi-inf.mpg.de/resources/MIR/chromatoolbox/ MATLAB implementations for various chroma variants