semi-supervised musical instrument...

25
Semi-supervised Musical Instrument Recognition Master’s Thesis Presentation Aleksandr Diment 1 1 Tampere University of Technology, Finland Supervisors: Adj.Prof. Tuomas Virtanen, MSc Toni Heittola 17 May 2013

Upload: others

Post on 24-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Semi-supervisedMusical Instrument Recognition

    Master’s Thesis Presentation

    Aleksandr Diment1

    1Tampere University of Technology, Finland

    Supervisors:

    Adj.Prof. Tuomas Virtanen, MSc Toni Heittola

    17 May 2013

  • 1 Introduction

    2 System description

    3 Evaluation

    4 Conclusions

    Outline

    1 Introduction

    2 System description

    3 Evaluation

    4 Conclusions

    Semi-supervised instrument recognition – A. Diment 17.5.2013 2/25

  • 1 Introduction

    Musical instrumentrecognition

    Semi-supervisedlearning

    Objectives and mainresults

    2 System description

    3 Evaluation

    4 Conclusions

    1 Introduction

    2 System description

    3 Evaluation

    4 Conclusions

    Semi-supervised instrument recognition – A. Diment 17.5.2013 3/25

  • 1 Introduction

    Musical instrumentrecognition

    Semi-supervisedlearning

    Objectives and mainresults

    2 System description

    3 Evaluation

    4 Conclusions

    IntroductionMusical instrument recognition

    Music information retrieval: obtaining information of variouskinds from music.

    Situationally tailored playlisting, personalised radio etc.

    Instrument recognition ⇒

    automatic music database annotation;

    automatic music transcription;

    musical genre classification.

    Semi-supervised instrument recognition – A. Diment 17.5.2013 4/25

  • 1 Introduction

    Musical instrumentrecognition

    Semi-supervisedlearning

    Objectives and mainresults

    2 System description

    3 Evaluation

    4 Conclusions

    IntroductionSemi-supervised learning

    Traditional learning paradigms:

    unsupervised: no additional knowledge about the datasamples

    supervised: a label is assigned to each data sample

    Supervised: large amounts of annotated training data needed.

    SSL: only part of training data needs to be annotated.

    Not yet applied for instrument recognition.

    Semi-supervised instrument recognition – A. Diment 17.5.2013 5/25

  • 1 Introduction

    Musical instrumentrecognition

    Semi-supervisedlearning

    Objectives and mainresults

    2 System description

    3 Evaluation

    4 Conclusions

    IntroductionObjectives and main results

    Objectives:

    studying techniques for musical instrument recognition;

    studying various SSL schemes;

    Main results:

    developed pattern recognition system for instrumentrecognition with SSL;

    two algorithms are implemented.

    Semi-supervised instrument recognition – A. Diment 17.5.2013 6/25

  • 1 Introduction

    2 System description

    Building blocks

    Feature extraction

    Training algorithms

    Labelled dataweighting

    One-class-at-a-timetraining

    3 Evaluation

    4 Conclusions

    1 Introduction

    2 System description

    3 Evaluation

    4 Conclusions

    Semi-supervised instrument recognition – A. Diment 17.5.2013 7/25

  • 1 Introduction

    2 System description

    Building blocks

    Feature extraction

    Training algorithms

    Labelled dataweighting

    One-class-at-a-timetraining

    3 Evaluation

    4 Conclusions

    System descriptionBuilding blocks

    Preprocessing

    Feature extraction

    Training

    Models

    Classification

    Input data

    FeaturesTraining set Testing set

    Decision

    Figure 1: A block diagram of a typical pattern classification system.

    Semi-supervised instrument recognition – A. Diment 17.5.2013 8/25

  • 1 Introduction

    2 System description

    Building blocks

    Feature extraction

    Training algorithms

    Labelled dataweighting

    One-class-at-a-timetraining

    3 Evaluation

    4 Conclusions

    System descriptionFeature extraction

    Relevant information is within the timbre, unique to a particulargroup of instruments.

    Set of tonal qualities which characterise a particular musicalsound.

    Everything except pitch, loudness and duration.

    Semi-supervised instrument recognition – A. Diment 17.5.2013 9/25

  • 1 Introduction

    2 System description

    Building blocks

    Feature extraction

    Training algorithms

    Labelled dataweighting

    One-class-at-a-timetraining

    3 Evaluation

    4 Conclusions

    System descriptionFeature extraction

    Pre-emphasis

    Frame blockingand windowing

    |FFT|2

    Log

    DCT ddt

    Input signal

    Static coefficients Delta coefficients

    Figure 2: A block diagram of MFCCs calculation.

    Semi-supervised instrument recognition – A. Diment 17.5.2013 10/25

  • 1 Introduction

    2 System description

    Building blocks

    Feature extraction

    Training algorithms

    Labelled dataweighting

    One-class-at-a-timetraining

    3 Evaluation

    4 Conclusions

    System descriptionTraining algorithms

    The EM algorithm ⇒ MLEs of model parameters when treatingobservations as incomplete data.

    Used when obtaining direct equations for the model is impossible.

    Extending EM for SSL: the iterative and incremental EM-basedalgorithms1.

    1 P. J. Moreno and S. Agarwal, “An experimental study of EM-based algorithms for semi-supervised learning in audioclassification”, in Proc. of the ICML-2003 Workshop on the Continuum from Labeled to Unlabeled Data, 2003.

    Semi-supervised instrument recognition – A. Diment 17.5.2013 11/25

  • Initial iteration

    Piano Guitar

    L

    U

    U

    L

    U U

    U

    U training sample from the unlabelled dataset L training sample from the labelled dataset

    Iteration t = 1

    Piano Guitar

    L

    U

    U

    L

    U

    U

    U

    Iteration t = 2. . .

    Piano Guitar

    L

    U

    U

    L

    U

    UU

    Training dataset

    Training

    Models

    Classification

    ...

    Unlabelled samples

    Labelled samples, piano

    Labelled samples, guitar

    Unlabelled samples classified as piano

    Unlabelled samples classified as guitar

    Figure 3: Schematic and feature space representation of the trainingstage with three iterations of the iterative EM-based algorithm.

  • Initial iteration

    Piano Guitar

    L

    U

    U

    L

    U U

    U

    U training sample from the unlabelled dataset L training sample from the labelled dataset

    Iteration t = 1

    Piano Guitar

    L

    U

    L

    U

    U

    U

    U

    Iteration t = 2. . .

    Piano Guitar

    L

    U

    U

    L

    U

    U

    U

    Training dataset

    Training

    Models

    Classification

    ...

    Unlabelled samples

    Labelled samples, piano

    Labelled samples, guitar

    Unlabelled samples classified as piano

    Unlabelled samples classified as guitar

    Figure 4: Schematic and feature space representation of the trainingstage with three iterations of the incremental EM-based algorithm.

  • 1 Introduction

    2 System description

    Building blocks

    Feature extraction

    Training algorithms

    Labelled dataweighting

    One-class-at-a-timetraining

    3 Evaluation

    4 Conclusions

    System descriptionLabelled data weighting

    The algorithms improve the performance in case the initiallabelled data size is relatively low.

    Not much difference when adding large or small amount ofunlabelled data.

    ⇒ de-weight the contribution of the unlabelled data.

    Simply: Sl = ω(t)♦Sl.

    Semi-supervised instrument recognition – A. Diment 17.5.2013 14/25

  • 1 Introduction

    2 System description

    Building blocks

    Feature extraction

    Training algorithms

    Labelled dataweighting

    One-class-at-a-timetraining

    3 Evaluation

    4 Conclusions

    System descriptionOne-class-at-a-time training

    Another issue of the iterative algorithm: resulting classificationaccuracy is oscillating along the iteration axis.

    0 5 10 15 20

    80

    90

    100

    Iteration

    Accuracy, %

    Semi-supervised instrument recognition – A. Diment 17.5.2013 15/25

  • 1 Introduction

    2 System description

    Building blocks

    Feature extraction

    Training algorithms

    Labelled dataweighting

    One-class-at-a-timetraining

    3 Evaluation

    4 Conclusions

    System descriptionOne-class-at-a-time training

    One-class-at-a time approach: at an iteration only one class isretrained.

    ⇒ previous models of one class are unaffected by traininganother one

    ⇒ fewer peaks

    ⇒ safer to chose an arbitrary iteration index for termination.

    Semi-supervised instrument recognition – A. Diment 17.5.2013 16/25

  • 1 Introduction

    2 System description

    3 Evaluation

    Datasets

    Baseline results

    Results with theiterative EM-basedalgorithm

    4 Conclusions

    1 Introduction

    2 System description

    3 Evaluation

    4 Conclusions

    Semi-supervised instrument recognition – A. Diment 17.5.2013 17/25

  • 1 Introduction

    2 System description

    3 Evaluation

    Datasets

    Baseline results

    Results with theiterative EM-basedalgorithm

    4 Conclusions

    EvaluationDatasets

    RWC Music Database2.

    Three variations per instrument.

    Separate notes within the whole range with a step of asemitone.

    44.1 kHz, 16 bit.

    2 M. Goto et al., “RWC music database: music genre database and musical instrument sound database”, in Proc. ofthe 4th Int. Conf. on Music Information Retrieval (ISMIR), 2003, pp. 229–230.

    Semi-supervised instrument recognition – A. Diment 17.5.2013 18/25

  • 1 Introduction

    2 System description

    3 Evaluation

    Datasets

    Baseline results

    Results with theiterative EM-basedalgorithm

    4 Conclusions

    EvaluationDatasets

    Table 1: List of instruments and number of recordings of the notesused in the smaller and larger sets, respectively.

    Instrument # notes

    Acoustic Guitar 702Electric Guitar 702Tuba 270Bassoon 360

    Total 2 212

    Instrument # notes

    Pianoforte 792Classic Guitar 702Electric Guitar 702Electric Bass 507Trombone 278Tuba 270Horn 288Bassoon 360Clarinet 360Banjo 941

    Total 5 200

    Semi-supervised instrument recognition – A. Diment 17.5.2013 19/25

  • 1 Introduction

    2 System description

    3 Evaluation

    Datasets

    Baseline results

    Results with theiterative EM-basedalgorithm

    4 Conclusions

    EvaluationBaseline results

    Baseline scenario: treating all available data as labelled. Upperbound for possible SSL performance.

    Table 2: Average classification accuracy across all classes infully-supervised case.

    Instrument set size Recognition accuracy, %

    4 instruments 92.110 instruments 82.9

    Semi-supervised instrument recognition – A. Diment 17.5.2013 20/25

  • 1 Introduction

    2 System description

    3 Evaluation

    Datasets

    Baseline results

    Results with theiterative EM-basedalgorithm

    4 Conclusions

    EvaluationResults with the iterative EM-based algorithm

    Table 3: Classification results with the iterative EM-based SSLalgorithm when incorporating both modifications.

    Instrument set size Recognition accuracy, %

    initial maximum absolute gain relative gain

    4 instruments 89.19 95.30 6.11 6.8510 instruments 58.73 68.43 9.70 14.18

    Semi-supervised instrument recognition – A. Diment 17.5.2013 21/25

  • 1 Introduction

    2 System description

    3 Evaluation

    Datasets

    Baseline results

    Results with theiterative EM-basedalgorithm

    4 Conclusions

    0 2 4 6 8 10 12 14 16 18 20

    80

    90

    100

    (Macro)iteration

    Accuracy, %

    Initial approachOne class at a time

    Labelled data weightingBoth approaches

    Figure 5: Comparison of the modifications to the iterative algorithmwith their combination and the initial version, smaller instrument set.

    Semi-supervised instrument recognition – A. Diment 17.5.2013 22/25

  • 1 Introduction

    2 System description

    3 Evaluation

    Datasets

    Baseline results

    Results with theiterative EM-basedalgorithm

    4 Conclusions

    5 6 8 10 15 20 30 100

    80

    90

    labelled/total dataset sizes ratio, %

    Accuracy, %

    Initial accuracyFinal accuracy

    Figure 6: Comparison of the accuracies of the initial models and thefinal iteration of the incremental EM-based SSL algorithm as a functionof the relative labelled dataset size with the smaller instrument set.

    Semi-supervised instrument recognition – A. Diment 17.5.2013 23/25

  • 1 Introduction

    2 System description

    3 Evaluation

    4 Conclusions

    1 Introduction

    2 System description

    3 Evaluation

    4 Conclusions

    Semi-supervised instrument recognition – A. Diment 17.5.2013 24/25

  • 1 Introduction

    2 System description

    3 Evaluation

    4 Conclusions

    Conclusions

    SSL for instrument recognition works. Gain 9.7%;

    as little as 7% data needs to be annotated;

    proposed extensions simplify termination and increaseaccuracy.

    Future work:

    more complex scenarios (more instruments, noise,reverberation. . . );

    neighbouring problems.

    Semi-supervised instrument recognition – A. Diment 17.5.2013 25/25