melody detection in polyphonic audio

21
Rui Pedro Paiva Melody Detection in Polyphonic Audio 9º Encontro da APEA October 20, 2007 Centro de Informática e Sistemas da Universidade de Coimbra

Upload: rui-pedro-paiva

Post on 30-Nov-2014

711 views

Category:

Business


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Melody Detection in Polyphonic Audio

Rui Pedro Paiva

Melody Detection in

Polyphonic Audio

9º Encontro da APEA

October 20, 2007

Centro de Informática e Sistemas da Universidade de Coimbra

Page 2: Melody Detection in Polyphonic Audio

2 Melody Detection in Polyphonic Audio

Outline

Introduction

Overview

Multi-Pitch Detection

From Pitches to Notes

Identification of Melodic Notes

Evaluation Procedures

Overall Results

Conclusions and Future Work

Page 3: Melody Detection in Polyphonic Audio

3 Melody Detection in Polyphonic Audio

Introduction

Motivation and Objectives

• Goal

- Extract a symbolic representation of melody from a polyphonic audio musical signal

• Broad range of applications

- MIR, music education, plagiarism detection, metadata

• No general-purpose, robust, accurate solution developed so far

Overall Approach

• Focus on melody, no matter what other sources are present (figure-ground separation)

Page 4: Melody Detection in Polyphonic Audio

4 Melody Detection in Polyphonic Audio

Introduction

Melody Definition

• Subjective concept

- Different people may define and perceive melody in different ways

- Proposals addressing cultural, perceptual, emotional or musicological facets of melody

• Context of our work

- “Melody is the individual dominant pitched line in a musical ensemble”

Page 5: Melody Detection in Polyphonic Audio

5 Melody Detection in Polyphonic Audio

Overview

Raw Musical

Signal Melody Notes

Multi-Pitch

Detection

1) Cochleagram

2) Correlogram

3) Pitch Salience

Curve

4) Salient Pitches

Determination of

Musical Notes

1) Pitch Trajectory Construction

2) Trajectory Segmentation

(with onset detection directly

on the raw signal)

Identification of

Melodic Notes

1) Elimination of Ghost Octave

Notes

2) Selection of the Most Salient

Notes

3) Melody Smoothing

4) Elimination of False Positives

Pitches in 46-msec Frames Entire Set of Notes

Page 6: Melody Detection in Polyphonic Audio

6 Melody Detection in Polyphonic Audio

Multi-Pitch Detection

0.21 0.22 0.23 0.24 0.25-0.4

-0.2

0

0.2

0.4

Time (s)

x(t

)a) Sound waveform

Time (s)

Fre

qu

en

cy c

ha

nn

el

b) Cochleagram frame

0.21 0.22 0.23 0.24 0.25

20

40

60

80

Time Lag (ms)

Fre

qu

ncy c

ha

nn

el

c) Correlogram frame

0 5 10 15 20

20

40

60

80

0 5 10 15 20 250

0.1

0.2

0.3

0.4

Time Lag (ms)

Pitch

sa

lien

ce

d) Summary correlogram

Page 7: Melody Detection in Polyphonic Audio

7 Melody Detection in Polyphonic Audio

From Pitches to Notes

Goal

• Quantize temporal sequences of detected pitches into MIDI notes

Approach

• Pitch Trajectory Construction [Serra]

• Frequency-Based Track Segmentation

• Salience-Based Track Segmentation

Page 8: Melody Detection in Polyphonic Audio

8 Melody Detection in Polyphonic Audio

From Pitches to Notes

Pitch Trajectory Construction

1 2 nn-1n-2

Frame Number

MID

I note

num

ber

Page 9: Melody Detection in Polyphonic Audio

9 Melody Detection in Polyphonic Audio

From Pitches to Notes

Frequency-Based Track Segmentation

• Approximation of frequency curves by piecewise-constant functions (PCFs)

- 1) MIDI quantization

- 2) Merging of PCFs: oscillation, delimited sequences, glissando

- 3) Time correction

- 4) Assignment of final MIDI note numbers: tuning compensation

b) Filtered PCFs

a) Original PCFs

Page 10: Melody Detection in Polyphonic Audio

10 Melody Detection in Polyphonic Audio

From Pitches to Notes

Frequency-Based Track Segmentation

0 100 200 300 400 5004800

4900

5000

5100

5200

5300

5400

Time (msec)

f[k] (c

en

ts)

a) Eliades Ochoa's excerpt

0 200 400 600 8006100

6200

6300

6400

6500

6600

6700

6800

Time (msec)f[k] (c

en

ts)

b) Female opera excerpt

Page 11: Melody Detection in Polyphonic Audio

11 Melody Detection in Polyphonic Audio

From Pitches to Notes

Salience-Based Track Segmentation - 1) Candidate Segmentation Points

- 2) Onset Detection on the audio signal [Scheirer; Klapuri]

- 3) Validation of Segmentation Points by Onset Matching

0 0.3 0.6 0.9 1.220

40

60

80

100

Time (sec)

sF

[k]

Page 12: Melody Detection in Polyphonic Audio

12 Melody Detection in Polyphonic Audio

Identification of Melodic Notes

Goal

• Separate the melodic notes from the accompaniment

Approach

• Elimination of Ghost Harmonically-Related Notes

• Selection of the Most Salient Notes

• Melody Smoothing

• Elimination of Spurious Accompaniment Notes

• Note Clustering

Page 13: Melody Detection in Polyphonic Audio

13 Melody Detection in Polyphonic Audio

Identification of Melodic Notes

Elimination of Ghost Harmonically-Related Notes

• Delete harmonically-related notes that satisfy the “common fate” principle

30 60 90 120 150 180 2100.94

0.96

0.98

1

1.02

1.04

1.06

Time (msec)

No

rma

lize

d F

req

ue

ncy

Page 14: Melody Detection in Polyphonic Audio

14 Melody Detection in Polyphonic Audio

Identification of Melodic Notes

0 1 2 3 4 5 6

40

50

60

70

80

90

Time (sec)

MID

I n

ote

nu

mb

er

0 1 2 3 4 5 6

40

50

60

70

80

90

Time (sec)

MID

I n

ote

nu

mb

er

Selection of the Most

Salient Notes Melody Smoothing

Time

MID

I note

num

ber

74

72

60

62

64

66

68

70

R1

a2

R3

a2

R2

a1

R4

a3

Page 15: Melody Detection in Polyphonic Audio

15 Melody Detection in Polyphonic Audio

Identification of Melodic Notes

Elimination of Spurious Accompaniment Notes

- 1) Delete deep valleys in the salience contour

- 2) Delete isolated abrupt duration transitions

0 5 10 15 20 25 30 3540

50

60

70

80

90

Note Sequence

Ave

rag

e P

itch

Sa

lien

ce

Page 16: Melody Detection in Polyphonic Audio

16 Melody Detection in Polyphonic Audio

Identification of Melodic Notes

Elimination of Spurious Accompaniment Notes

0 5 10 1540

50

60

70

80

90

Time (s)

Pitc

h S

alie

nce

Page 17: Melody Detection in Polyphonic Audio

17 Melody Detection in Polyphonic Audio

Identification of Melodic Notes

Note Clustering

• Feature Extraction - Spectral shape, harmonicity, attack transient,

intensity, pitch-related features

• Feature Selection and Dimensionality Reduction

- Forward feature selection

- Principal Component Analysis

• Clustering - Gaussian Mixture Models

– 2 clusters: melodic and “garbage” cluster

Page 18: Melody Detection in Polyphonic Audio

18 Melody Detection in Polyphonic Audio

Evaluation Procedures

Ground Truth Data ID Song Title Category Solo Type

1 Pachelbel’s “Kanon” Classical Instrumental

2 Handel’s “Hallelujah” Choral Vocal

3 Enya – “Only Time” New Age Vocal

4 Dido – “Thank You” Pop Vocal

5 Ricky Martin – “Private Emotion” Pop Vocal

6 Avril Lavigne – “Complicated” Pop/Rock Vocal

7 Claudio Roditi – “Rua Dona Margarida” Jazz/Easy Instrumental

8 Mambo Kings – “Bella Maria de Mi Alma” Bolero Instrumental

9 Eliades Ochoa – “Chan Chan” Son Vocal

10 Juan Luis Guerra – “Palomita Blanca” Bachata Vocal

11 Battlefield Band – “Snow on the Hills” Scottish Folk Instrumental

12 daisy2 Pop Vocal

13 daisy3 Pop Vocal

14 jazz2 Jazz Instrumental

15 jazz3 Jazz Instrumental

16 midi1 Pop Instrumental

17 midi2 Folk Instrumental

18 opera female 2 Opera Vocal

19 opera male 3 Opera Vocal

20 pop1 Pop Vocal

21 pop4 Pop Vocal

Page 19: Melody Detection in Polyphonic Audio

19 Melody Detection in Polyphonic Audio

Evaluation Procedures

Evaluation Metrics

• Pitch Contour Accuracy - Melodic Raw Pitch Accuracy (MRPA)

- Overall Raw Pitch Accuracy (ORPA)

- Melodic Raw Note Accuracy (MRNA)

- Overall Raw Pitch Accuracy (ORNA)

• Note Extraction Accuracy - Percentage of correct frames

• Melody/Accompaniment Discrimination - Recall

- Precison

Page 20: Melody Detection in Polyphonic Audio

20 Melody Detection in Polyphonic Audio

Overall Results

Multi-Pitch Detection • MRPA = 81.0%

Note Determination • Frequency-Based Track Segmentation: Re = 72%; Pr = 94.7%

• Salience-Based Track Segmentation: Re = 75%; Pr = 41.2%

Identification of Melodic Notes • Elimination of Ghost Notes (EGN)

- Eliminated 37.8% of notes (only 0.3% melodic)

• Overall Melody Extraction - MRNA = 84.4%; ORNA = 77.0%

• Melody/Accompaniment Discrimination - Recall = 31.0%; Precision = 52.8%

• MIREX’2004 - Train (note accuracy) = 80.1/ 77.1%; Test = 77.4 / 75.1%

- Train (pitch contour accuracy) = 75.1; 71.1%

• MIREX’2005 - 61.1% overall raw pitch accuracy

Page 21: Melody Detection in Polyphonic Audio

21 Melody Detection in Polyphonic Audio

Conclusions and Future Work

Main Contributions • Note determination

• Identification of melodic notes in a mixture

Conclusions • Adequate MPD approach in a melody context

• Note determination with satisfactory accuracy

• Identification of melodic notes successful in medium SNR conditions

Future Work • Larger musical corpus

• Reduce computational time for pitch detection

• Melody/Accompaniment discrimination