[studies in fuzziness and soft computing] soft computing in acoustics volume 31 ||

Soft Computing in Acoustics

Studies in Fuzziness and Soft Computing Editor-in-chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Seiences ul. Newelska 6 01-447 Warsaw, Poland E-mail: [email protected]

Vol. 3. A. Geyer-Schulz Fuzzy Ru/e-Based Expert Systems and Genetic Machine Leaming, 2nd ed. 1996 ISBN 3-7908-0964-0

Vol. 4. T. Onisawa and J. Kacprzyk (Eds.) Reliability and Safety Analyses under Fuzziness, 1995 ISBN 3-7908-0837-7

Vol. 5. P. Bosc and J. Kacprzyk (Eds.) Fuzziness in Database Management Systems, 1995 ISBN 3-7908-0858-X

Vol. 6. E. S. Lee and Q. Zhu Fuzzy and Evidence Reasoning, 1995 ISBN 3-7908-0880-6

Vol. 7. B.A. Juliano and W. Bandler Tracing Chains-of-Thought, 1996 ISBN 3-7908-0922-5

Vol. 8. F. Herrera and J. L. Verdegay (Eds.) Genetic Algorithms and Soft Computing, 1996, ISBN 3-7908-0956-X

Vol. 9. M. Sato et al. Fuzzy Clustering Models and Applications, 1997, ISBN 3-7908-1026-6

Vol. 10. L. C. Jain (Ed.) Soft Computing Techniques in Knowledge-based Intelligent Engineering Systems, 1997, ISBN 3-7908-1035-5

Vol. II. W. Mielczarski (Ed.) Fuzzy Logic Techniques in Power Systems, 1998, ISBN 3-7908-1044-4

Vol. 12. B. Bouchon-Meunier (Ed.) Aggregation and Fusion of lmpeifect Information, 1998 ISBN 3-7908-1048-7

Vol. 13. E. Orlowska (Ed.) lncomp/ete Information: Rough Set Analysis, 1998 ISBN 3-7908-1049-5

Vol. 14. E. Hisdal Logical Structures for Representation of Knowledge and Uncertainty, 1998 ISBN 3-7908-1056-8

Vol. 15. G.J. Klir and M.J. Wiennan Uncertainty-Based Information, 1998 ISBN 3-7908-1073-8

Vol. 16. D. Driankov and R. Palm (Eds.) Advances in Fuzzy Contro/, 1998 ISBN 3-7908-1090-8

Vol. 17. L. Reznik, V. Dimitrov and J. Kacprzyk (Eds.) Fuzzy Systems Design, 1998 ISBN 3-7908-1118-1

Vol. 18. L. Polkowski and A. Skowron (Eds.) Rough Sets in Knowledge Discovery 1, 1998, ISBN 3-7908-1119-X

Vol. 19. L. Polkowski and A. Skowron (Eds.) Rough Sets in Knowledge Discovery 2, 1998, ISBN 3-7908-1120-3

Vol. 20. J.N. Mordeson and P.S. Nair Fuzzy Mathematics, 1998 ISBN 3-7908-1121-1

Vol. 21. L.C. Jain and T. Fukuda (Eds.) Soft Computing for Intelligent Robotic Systems, 1998 ISBN 3-7908-1147-5

Vol. 22. J. Cardoso and H. Camargo (Eds.) Fuzziness in Petri Nets, 1999 ISBN 3-7908-1158-0

Vol. 23. P. S. Szczepaniak (Ed.) Computationa/lntelligence and Applications, 1999 ISBN 3-7908-1161-0

Vol. 24. E. Ortowska (Ed.) Logic at Work, 1999 ISBN 3-7908-1164-5

continued on page 244

Bozena Kostek

Soft Computing in Acoustics Applications of Neural Networks, Fuzzy Logic and Rough Sets to Musical Acoustics

With 118 Figures and 84 Tables

Springer-Verlag Berlin Beideiberg GmbH

Dr. Boiena Kostek Sound Engineering Department Faculty of Electronics, Telecommunications & lnformatics Technical University of Gdari.sk ul. Narutowicza 11112 80-952 Gdari.sk Po land E-mail: [email protected]

Cataloging-in-Publication Data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Kostek, Bozena: Soft computing in acoustics: applications of neural networks, fuzzy logic and rough sets to musical acoustics; with 84 tables I Bozena Kostek.

(Studies in fuzziness and soft computing; Vol. 31)

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concemed, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag Berlin Heidelberg GmbH. Violations are liable for prosecution under the German Copyright Law.

© Springer-Verlag Berlin Heidelberg 1999 Originally published by Physica-Verlag Heidelberg New York in 1999

The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific Statement, that such names are exempt from the relevant proteelive laws and regulations and therefore free for general use.

Hardcover Design: Erich Kirchner, Heidelberg

SPIN I 0710186 8812202-5 4 3 2 I 0 - Printed on acid-free paper

ISBN 978-3-662-13005-6 ISBN 978-3-7908-1875-8 (eBook) DOI 10.1007/978-3-7908-1875-8

To my Parents

FOREWORD

Soft computing, the term introduced recently by Lotfi Zadeh, combines various aspects of new computing techniques such as fuzzy and rough sets, neural networks, genetic algorithms, and others. It turned out that this new paradigm of computing can be used with success in many fields of sciences and engiDeering by offering better algorithms and enabling the analysis of data that would not have been otherwise possible by using, for example, statistical methods.

This book deals with a wide spectrum of topics which are important both not only for acoustics but more generally for computer science.

The book addresses a number of topics such as data representation in musical acoustics, automatic classification of musical instrument sounds, automatic recognition of musical phrases, and others - using soft computing techniques. Besides, basics elements of neural networks, fuzzy sets and rough sets are presented in a clear and understandable manner.

This is a pioneering book, revealing the author's original results on applications of soft computing techniques to analyze and solve important problems in acoustics. These can be useful and attractive for all readers interested in the above areas, and may also serve as a reference book in these domains.

Without any doubt the book is an important achievement in the application of soft computing techniques in acoustics with a special emphasis on musical acoustics and subjective quality assessment.

I would like to congratulate Dr. Bozena Kostek for her excellent, highly original work.

Warsaw, September 1998 Zdzislaw Pawlak

PREFACE

Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?

T.S.Eiiot

Acoustics is an old and very well-developed scientific domain, having its roots in physics and broadening its scope over time into musical aesthetics, speech and hearing sciences, signal processing, and others. There are many fields of application within the acoustical realm, among others including the acoustics of speech combined with signal processing and soft computing methods, together forming the specialized application of automatic speech recognition. As a result of this methodology, today's computers can be voice-operated. Since Roederer says music "seems to be a quite natural by-product of the evolution of speech and language" [180], then it is also possible to foresee similar applications of musical acoustics. The present state of computer technology already enables such implementations. There is sufficient storage space available for acquiring music and there are means for storing it and sending it via Internet, but there is still a need for software tools that allow for the intelligent retrieval of inforrnation from music. That is why only now may the research carried out by the author find direct applications in multimedia databases and electronic libraries.

There is one more aspect that should be mentioned. lt was noticed by the author that the presented work is in a way related to a new paradigm which was recently introduced by L. Zadeh, namely "computing with words" [220]. While assessing the quality of music, hornans use criteria that are rarely quantitative but most often qualitative. Therefore, there is a need to find methods that make it possible to process such descriptive notions as good, poor, clear, soft, bright, dark, high, low, etc. with techniques that are adequate for this task. Soft computing offers techniques which have been developed and tested in many other domains and applications.

This book addresses the broad problern of automatically recognizing musical sounds and musical phrases. Many applications for the algorithms dealing with these tasks may be foreseen. One such possible application rnay be to search a musical database for the sounds of chosen instruments or for musical tunes, and a

X PREFACE

second application may be to automatically compose music. Nowadays, with the rapid growth of electronic libraries and databases such as those found on the Internet, the first mentioned application seems tobe of more importance. A system for scanning the Internet for particular musical scores or for the sounds of musical performances would be most helpful. It is presently easy to conduct a search based on a text string, but it is not so easy to either define or search for an object which could be described as a musical data string. The reason for this is that musical scores are always performed slightly differently from the way they are written, with a certain Ievel of inaccuracy and uncertainty leaving a wide margin for individual interpretation. In the realm of audio technology, there are still more tasks that could be solved if one were able to efficiently search for musical phrases. For example, cue points could be automatically located within digital audio editors. Intelligent algorithms could also help to fmd desired fragments of recorded musical pieces or the entrances of selected musical instruments.

The problern of algorithmic analysis of musical phrases can be approached in two different ways, depending on the musical representation: the acoustical analysis of sound; and the analysis of musical scores, which are electronically represented by the MIDI code. These two tasks require approaches which are largely different because they must process data in different ways. In the first case, the representation of an acoustic signal must be dealt with, while in the second case the problern pertains to numerical data processing. Nevertheless, there are some common points in the algorithms used in these cases, such as time normalization and the classification and identification of objects.

There are also two other problems that are dealt with in this book. The first one describes the processing of subjective test results by means of soft computing methods. It should be remernbered that it is still one of the most vital, and at the same time unsolved problems in acoustics. The second application pertains to musical instrument principle of working, namely the control of a classical pipe organ instrument using fuzzy logic inference. The common point between all of the presented topics is the application of soft computing methods in order to solve some ofthe specific problems of acoustics.

The aim of this research study is the implementation of soft computing methods to musical signal analysis and to the recognition of musical sounds and phrases. Accordingly, methods based on such learning algorithms as neural networks, rough sets and fuzzy-logic were conceived, implemented and tested. Additionally, the above-mentioned methods were applied to the analysis and verification of subjective testing results. The last problern discussed within the framework of this book is the concept of fuzzy control of the classical pipe organ instrument.

The obtained results show that computational intelligence and soft computing may be used for solving some vital problems in both musical and architectural acoustics.

PREFACE xi

ACKNOWLEDGEMENTS

I feel deeply honored and at the same time indebted to Professors Z. Pawlak and A. Skowron for their encouragement and interest in this research work.

Much gratitude I owe my distinguished Teachers of Sound Engineering and Acoustics - Professors M. Sankiewicz and G. Budzynski for introducing me to these interesting, interdisciplinary scientific domains.

I would also like to thank all my colleagues from the Sound Engineering Department of the Technical University of Gdansk for the discussions that inspired and motivated me to work harder.

I would like to express my appreciation for the fmancial support contributed by the Committee for Scientific Research, Warsaw, Poland - parts of this work were supported by grants No. 8 Tl1C 028 08 and No. 8 TllD 021 12.

Finally, I would also like to thank my husband - Prof. A. Czyzewski for his encouragement and for sharing scientific interests with me.

September, 1998 Gdansk, Poland

Bozena Kostek

CONTENTS

FOREWORD ..........................•...................................................................... vii Z. Pawlak

PREFACE ...............•......................................................•.......................•.....•.. ix

1. INTRODUCTION ..................................................................................... 1

2. SOME SELECTED SOFT COMPUTING TOOLS AND TECHNIQUES .......................................................................................... 5

2.1. Artificial Neural Nehvorks ................................................................ 5 2.1.1. Neural Network Design ............................................................... 7 2.1.2. The EBP Algorithm ..................................................................... 8 2. 1.3. Application ofPnming Weight Algoritluns ................................ 10

2.2. Fuzzy Sets and Fuzzy Logic ............................................................ 11 2.2.1. Fuzzy Logic in t11e Control Technique ........................................ 13

2.3. Rough Sets ....................................................................................... 19

3. PREPROCESSING OF ACOUSTICAL DATA .................................... 25

3.1. Musical Signal Representation ........................................................ 26 3 .1.1. Parametrie Representation ......................................................... 28 3.1.2. Time Domain Representation ..................................................... 30 3. 1.3. Spectral Representation ............................................................. 34 3.1.4. Time-Frequency Representation ................................................. 49 3.1.5. Special Parameters ..................................................................... 51

3.2. Musical Phrase Analysis .................................................................. 54 3.2.1. Musicological Analysis .............................................................. 55 3.2.2. MIDI Representation ................................................................. 60

XlV CONTENTS

3.3. Acquisition of Test Results .............................................................. 62 3 0 3 01. Objective Measurement Results 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 62 303020 Subjective Test Results oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 74 303030 Statistical Processing ofDataOOOOOOOOOOOOOOooOOooooooooooooooooooooooooooooooooooo 74

3.4. Data Discretization .......................................................................... 81 3 0 4 010 Quantization Algorithms 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 84 3.4020 Clusterization Algorithms 0000000000000000000000000000000000000000000000000000000000 88 3.4030 Practical Implementation OOOOOOOOOOOOOOOOOOooOOOOooOOooooooooooooooooooooooooooooooo 93

4. AUTOMATIC CLASSIFICATION OF MUSICAL INSTRUMENT SOUNDS ................................................................................................. 97

4.1. Uncertainty of Musical Instrument Sound Representation ............ 97

4.2. Feature Vector Extraction ............................................................ 104 40201. Multimedia Database 0000000000 00 00 00 0000000000 00 00 0000000000 00 00 00 00 00 00 00 00 00 00 000 105 402020 Parameter Extraction ooooooooooooooooooooooooooooooOOooOOOOooOOOOOOOOOOOOOOOOOOOOOOO 109

4.3. Statistical Properties of Musical Data ........................................... 110 4 0 3 0 1. Separability of Original Data V alues 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 113 4.3020 Separability ofDiscretized DataooooooooooooooooooooooooooooOOOOOOOOOOOOOOOOOOO 114

4.4. Neural Network as a Classifier of Musical Instruments ............... 117 4.4 01. Training Procedures 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 118 4.4020 Recognition Experiments oooooooooooooooooooooooooooooooooooooooooooOOooOOOOOOOOOO 122

4.5. Rough Set Decision System as a Classifier of Musical Instruments .................................................................................... 127

40501. Attribute Discretizationoooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 128 405020 Training Proceduresooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 129 405030 RecognitionExperimentsoooooooooooooooooooooooooooooooooooooooooooooOOOOOOOOOOOO 129

5. AUTOMATIC RECOGNITION OF MUSICAL PHRASES ................ 135

5.1. Data Acquisition ............................................................................ 135 501.1. Conversion to MIDI DataooooooooooooooooooooooooooooooooooooooooooooooooooooooOOO 136 501.20 Processing of MIDI Data ooooooooooooooooooooooooooooooooooooooooooooooooooooOOOOO 137

5.2. Parametrization Process ................................................................ 140 50201. Time Normalization Methodsooooooooooooooooooooooooooooooooooooooooooooooooooo 140 502020 Statistical Parametrizationoooooooooooooooooooooooooooooooooooooooooooooooooooooooo 144 502.30 Trigonometrie Approximation ofMusical Phrasesoooooooooooooooooooo 148 502.40 Separability ofParameter Values ooooooooooooooooooooooooooooooooooooooooooooo 154

CONfENTS XV

5.3. Neural Network as a Classifier of Musical Phrases ...................... 156

5.4. Rough Set-Based Classification of Musical Phrases ..................... 158 5. 4 .1. Parameter Discretization...................................................... . . . . 15 8 5.4.2. RecognitionExperiments ......................................................... 158

6. INTELLIGENT PROCESSING OF TEST RESULTS ......................... 165

6.1. Inconsistency of Subjective Assessment Results ........................... 165

6.2. Application of Fuzzy Logic to the Processing of Test Results ....... 166 6.2.1. Evaluation of Reverberator Features ....................................... 168 6.2.2. Evaluation of Audio CODEC Features .................................... 177

6.3. Application of Rough Sets to the Processing of Test Results ........ 180 6.3.1. Evaluation of Reverberator Features ....................................... 180 6.3.2. Evaluation of Audio CODEC Features ..................................... 182

6.4. Rough-Fuzzy Method of Test Result Processing .......................... 183 6.4.1. Evaluation of the Acoustical Features of Concert Halls ............ 184 6.4.2. Optimization ofNoise Reduction Algorithm Parameters ........... 198

7. CONTROLAPPLICATIONS ............................................................... 207

7.1. Articulation-Related Features in the Pipe Organ Sound .............. 208 7 .1.1. Time Delays ............................................................................ 208 7.1.2. Attack Transient in Pipe Sound ................................................ 212

7.2. Fuzzy Control of Pipe Organ ........................................................ 214 7.2.1. General Characteristics ofPipe Valve ...................................... 214 7.2.2. System Description .................................................................. 218

8. CONCLUSIONS .................................................................................... 227

9. REFERENCES ...................................................................................... 231

1. INTRODUCTION

Acoustics is a very old domain of science, with notions of both architectural and musical acoustics appearing in classic Greek and Latin studies. Certainly some concepts were known in even more ancient epochs, with the Sumerian civilization already using such musical instruments as flutes, harps and lyres. The Sumerians passed along their artistic traditions to the Assyrian culture, and ancient Egypt also had its own musical culture. It is said that the world' s first bugging systems, based on acoustical principles, were installed in the palaces of the Pharaohs. The first notions of theater date back to the fifth century BC, but with more emphasis on the architectural than the acoustical issues. This is unfortunate, because even now in some of the outdoor theaters that still exist (e.g. Epidaurus, Greece), sound energy spreads from the source evenly. Regrettably, the ancient architects did not leave their acoustical designs for us to study. In researching history, one may find works by Heron which indicate that the sound phenomenon results from vibrations in air in the form of waves. The Roman architect Vitruve, who in his ten books on architecture included notions on acoustics, should also be mentioned. In the index of his fifth book, one may find "theaters and the choice of location where they should be founded", "resonators in theaters" (in modern acoustics called Helmholtz' resonators), "Roman and Greek theaters", etc. It should be said that apart from the mentioned book, there are no references to any other books dealing with empirically based acoustics. Some previously written sources, like those of the Greek phi1osophers Phythagoras, Plato, Aristotle and Aristoxenes of Tarent, are based on mathematical principles. Further observations on acoustics appear in the works of Leonardo da Vinci. There are also contemporary references to the book entitled "Magiae Universalis", written in 1657 by Kasper Schott, a Jesuit professor of physics. This is one of the earliest book sources on the subject of acoustics [21].

In more recent times, the modern study of acoustics was begun by Lord Rayleigh. The efforts of the architectural acoustic pioneers - Sabine, Knudsen, Eyring, Knowles, Fitzroy and Beranek - should also be mentioned, because they brought this domain to the Ievel where it is today. Over the years, these scientists and their associates founded entire fields of research in acoustics.

Modern acoustics also encompasses computer studies as applied to this domain. Only recently, however, have a variety of computer programs become available to assist the designer concerned with architectural acoustics. It should be remernbered that the calculations performed by such programs are only approximate, and thus they may not suit a particular application and may even

2 CHAPTER 1

result in substantial error. Another difficulty in acoustical design is that it is rare for a hall to be used for a single purpose. Thus, the so-called "optimum" requirements vary for each type of usage, and as a result the acoustical solutions are very often compromises. Since the primary aim in the design of an acoustical space is sound quality, it is therefore nece:;sary to correlate objective measurements to subjective impressions of an interior space. Correlating an objective measurement to an expert's subjective assessment, however, is not an easy process. Many Iiterature references already exist as to how this process may be carried out, but there is not yet any consensus on this still unresolved acoustical subject.

Computers are also used in the field of musical acoustics. In this case, they support studies which strive to understand music cognition; e.g. music production, perception and interpretation. Their role has also greatly increased in the areas of sound synthesis, automatic music composition and music recognition.

The research studies which are dealt with in this book represent a hybrid of various disciplines. They apply soft computing methods to selected problems in both architectural and musical acoustics.

In this research work, a different view on handling the uncertainty of acoustical data by using both rough set and fuzzy set algorithms has been proposed. Therefore, this book starts with a chapter which reviews some selected soft computing methods, beginning with neural networks and fuzzy set theory, and including rough set theory. This chapter aims at presenting only the main concepts of the mentioned methods, since the details are extensively covered in a rich selection of literature. Following this, the next chapter focuses on some methods for preprocessing data in musicill and architectural acoustics. It consists of one part which deals with musical sound representation, and a second part which introduces the musical phrase analysis. Within this chapter, methods of sound parametrization are discussed. Additionally, a review of the discretization methods which may be applicable to the musical acoustics domain is given. This latter issue is important when dealing with inconsistency and uncertainty within the data. The discretization process is aimed at replacing specific data values with interval numbers to which they belong. The final chapter deals with the acquisition of test results and the statistical processing of data obtained through objective measurement and subjective testing. An overview of the experiments is included, with more detailed descriptions available through some of the cited author's papers. Consequently, in this book some applications will be presented extensively.

The first application described within the scope of this discussion is the problern of automatically classifying musical instrument sounds. In the described experiments, ·both neural networks and rough set-based learning algorithms are applied. It should be mentioned that the rough set method was introduced by the author to this task. This chapter starts with a review of some examples of the uncertainty of musical instrument sound representation, which is mainly caused by the unrepeatable nature of musical signal. Therefore, the preliminary work

INTRODUCTION 3

involved the acquisition of rnusica1 instrument sounds and the creation of a rnultimedia rnusica1 database which helped to systernatize the acquired information. On the basis of sorne experirnenta1 studies, the feature vectors to be used in the main experiments were then defined. Within this part of the work, the statistica1 properties of rnusica1 data are a1so discussed. Exarnples of training procedures using neural networks as classifiers of rnusical instrument sounds are shown. Forthis purpose, the previously extracted feature vectors were used. The rough set-based a1gorithm proved to be rnore efficient, however, when dea1ing with discretized data, thus the preliminary step in the training procedure was the quantization of the pararneter va1ue dornain.

The second problern addressed in this work, the autornatic recognition of rnusica1 phrases, is then discussed. All steps of this process are reviewed. In the beginning, sorne issues related to data acquisition based on the MIDI code are shown. Subsequently, two parametrization rnethods are described which were then applied in the experiments. Additionally, certain statistical properties of the parametrized data are reviewed. For the purpose of classifying musical patterns, neura1 networks and the rough set -based method were again applied. Since this section provides a review of studies carried out by the author within the frarnework of the research project entitled, "The Application of Artificia1 Intelligence Methods to the Ana1ysis and Processing of Data in Acoustics," described in the papers cited within the chapter, only exernplary results are therefore presented in this book.

Showing the problems related to the processing of subjective test results is the third main problern within the framework of this presentation. Fuzzy logic and the rough set rnethod were applied to the processing of test results when subjectively evaluating the qua1ity of electroacoustic equipment or low-bit rate a1gorithrns. The proposed methods of subjective test result processing rnay be used either in place of or together with classica1 statistica1 analysis. The fuzzy set method yields a comprehensive rating matrix which revea1s the parameters most contributing to the tota1 qua1ity. The rough set approach produces reducts and a set of mles, allowing one to study the principles underlying the subjects' decisions.

Rough set analysis was a1so applied to objectively rneasured acoustical data. The obtained results show that this method may be irnplemented in roorn quality ana1ysis, helping to solve sorne vital problems in architectura1 acoustics. The advantage of this approach in comparison with other methods has to do with the possibility of eliminating irrelevant attributes while in the process of generalizing the rules pertaining to acoustical quality assessment.

Additionally, a new concept was introduced. Called the Fuzzy Perceptual Quantization Method (FPQM), this concept is based on psychometric testing. As is a1ready known, the process of tuning a perceptual audio coding algorithm requires frnding relationships between the masking algorithm parameters and their influence on the subjective quality of the processed audio. To discover ill-defined relationships which underlie the irnplernented perceptual rnodel of hearing, the

4 CHAPTER 1

rough set method was employed. The FPQM is used for determining the settings of the masking model.

Moreover, a method of automatic acoustical quality assessment, using the combination of a rough set decision system and fuzzy logic inference, is proposed by the author. A rough set algorithm is applied to a database containing quantized subjective parameters, and results in an overall subjective preference for the acoustical objects described by these parameters. The fuzzy membership functions are determined on the basis of separate subjective testing of individual parameters which underlie the overall preference. In this way, a knowledge base is built which contains both objective and subjective values that are linked by hidden relationships. Then, for testing the proposed expert system, a fuzzy logic system automatically provided quality assessment. The fuzzy system uses both membership functions which are empirically determined for the tested parameters and the rules generated in the training phase by the rough set algorithm.

The last problern discussed in this work is the introduction of articulationrelated features to the pipe organ sound. Consequently, a brief review of classical pipe organ control systems is presented, showing musicians' preferences. Computerizing classical pipe organs opens new domains of interest, in wirich modern technology meets the traditional ways of playing such instruments. The process consisting of the depression of the key, the reaction of the valve and the resulting build-up of sound is difficult to describe mathematically. The author investigated the musical articulation in classical pipe organ instrument sounds when she completed her Ph.D. research work. The problems related to the control of a pipe organ instrument, however, were solved using standard microprocessor technology and computer techniques. Taking into account that these processes are imprecise in nature, a typical microprocessor system for an organ may be replaced by a leaming control system capable of modeling undefined nonlinearities. Such modeling may be supported by the system based on exemplary entries and related decisions. Consequently, fuzzy logic techniques may be implemented in such a control system. For the purpose of this research work, a model of a pipe organwas designed, constructed and tested. The obtained results are given in the final part of thisbook.

The last two chapters outline t11e conclusions which may be derived from tl1e studies which were carried out by the author, and additionally show a Iist of references which provide additional details related to the problems presented in this book.

2. SOME SELECTED SOFT COMPUTING TOOLS AND TECHNIQUES

There are several definitions concerning soft computing as a domain of science. The most widely known and most often applied soft computing ( or computational intelligence) methods are neural networks, multivalued logic, fuzzy sets and fuzzy logic [219], Dempster-Shafer theory [215], rough sets [161], probabilistic reasoning, evolutionary computation, etc. Particular attention was paid in this work to neural networks, fuzzy logic and rough sets. Neural networks may be treated as tools for modeling dependencies between variables. On the other hand, both fuzzy and rough sets are formal methods for dealing with uncertainty. These techniques are reviewed further in this chapter, because they are used to provide a kerne! to decision algorithms as applied to classification tasks. A particular justification for the application of decision systems in this area is provided by the fact that the management of uncertainty in acoustics should be based on the knowledge of experts - the best criterion for assessing the acoustical quality of music.

Finally, several other factors should be considered when selecting a technique for application to a specific problem: efficiency, memory size, complexity, the ability to generalize, etc. Therefore, in some applications a hybrid approach is chosen and refined to overcome the limitations of one technique by combining it with another which is more effective in specific tasks.

Since other mentioned soft computing techniques were not applied in the experiments carried out by the author, therefore they are only mentioned here.

2.1. Artificial Neural Networks

Neural networks have proven to be important decision making tools over a broad spectrum of applications, including such tasks as classification and cluster analysis of data. Systems based on these algorithms have become especially significant in the processes of speech and irnage recognition, and applications in the classification of musical sounds have also appeared (38](101](114][122][123] (156](157]. The latter usage has become one of the most interesting areas within the broader field of musical acoustics. Using neural networks (NN) for recognizing musical instrument classes is not a trivial task because of the large amount of data which must be handled. Not only is it very time consuming, but also the validity

6 CHAPTER2

of the result relies on empirical optimizations of the NN structure and on the training method. Since Artificial Neural Networks (ANN) have become standard tools in many domains, only the main features of such algorithms will be reviewed in this chapter, especially those which were exploited in the experiments.

Artificial Neural Networks have the ability of learning and adapting to new situations by recognizing pattems in previous data. The neural network processes an input object by using the knowledge acquired during the training phase. Methods of training neural networks are often divided into two basic classes: training with a teacher (with supervision) and without a teacher (without supervision). In the case of supervised leaming, pattem-elass information is used. An unknown probability density function p(x) describes the continuous distribution of pattems x in the pattem space Rn. During the learning process, an accurate estimation of p(x) is searched for. Supervised learning algorithms depend on the class membership of each training sample x. Class-membership information allows the detection of pattem misclassifications and the computation of an error signal. The error information then reinforces the learning process. Unsupervised learning systems use unlabelled pattem samples. They adaptively gather pattems into clusters or decision classes Di. In the case of neural networks, supervised learning is understood as a process in which the gradient descent in the space of all possible synaptic-values is estimated. The supervisor uses class-membership information to define a numerical signal that guides the estimated gradient descent [28] [95] [200] [226].

In recent years, a variety of artificial neural network classifiers were developed. Much attention was paid both to network architectures and learning algorithms. Today a large collection of neural algorithms is available, which can be used in modeling of dependencies, processes and functions. Besides NN basic topology such as perceptron, Hopfield networks, bidirectional associative memozy (BAM) networks or their transformations are at disposal.

Artificial Neural Networks, in general, can be classified as feedforward and feedback types depending on the interconnection type of the neurons. At present, multi-layer networks of the feedforward type, which are trained using the error back-propagation method (EBP), are applied to the majority of applications employing neural computing. That is why this presentation is limited only to this kind of neural network.

Multilayered feedforward networks have, however, some essential drawbacks. Among these are the possibility of poor training convergence, difficulty in setting optimal or suboptimal values of learning parameters which then influence the convergence, the feasibility of being trapped in local minima, and poor generalization in the case of improper network size. The frrst three problems can be partially solved by assigning variables as learning parameters which could change according to the convergence rate and training development [ 14 7] [226] [227]. On the other band, the problern related to the neural network size is generally still unsolved. There are some techniques, however, called weight pruning algorithms, that allow better network designing [88]. The basic principles of such algorithms will be further examined.

SOME SELECTED SOFT COMPUTING TOOLS AND TECHNIQUES 7

2.1.1. Neural Network Design

The design and operation of a feedforward network is based on a net of artificial neurons. The simplest case of a neural network is a single neuron. The artificial neuron consists of a processing element, input signals x=[xbx21 x3, ... ,xNf e~ and a single output o (Fig. 2.1). The output vector is defined as:

(2.1)

where w is the synaptic weight vector:

(2.2)

w0 is the thresho1d ofthe neuron andf is a neuron activation function. As may be seen from Fig. 2.1, each of the input signals flows through a

synaptic weight. The swnming node accumulates all input-weighted signals and then passes to the output through the transfer function ({). The commonly used activation functions are of sigmoidal type (unipolar, bipolar, hyperbolic tangent, etc.) [226]. The sigmoidal transfer function is given by the following formula:

!0=--1 __ 1 + exp( -a · x)

(2.3)

where a is the coefficient or gain which adjusts the slope of the function that changes between the two asymptotic values (0 and + 1). This function is nonlinear, monotonic and differentiable and since the error back-propagation method using the delta learning rule requires a differentiable function, the sigmoidal transfer function is for this reason of interest in most applications.

Fig. 2.1. Artificial neuron model.

A two-layer network of the feedforward type is one of the most commonly used structures (see Fig. 2.2).

8 CHAPTER2

V X!

X2

Xi

XI

-1

X y

Fig. 2.2. Feedforward multi-layer network

The vector and matrix notation is more convenient for dealing with inputs, weights and outputs. The consecutive layers are denoted as the input layer x, hidden layer y and output layer o. The number of neurons for the consecutive layers is x -I, y - J, and o - K, respectively. Let V(J+ l x /+ l) and W(K x J+ l) be, respectively, the input-to-hidden layer and the hidden-to-output layer synaptic weights. The input and hidden layers may an additional dummy neuron each. The output value of the neuron is constant and equals -1, whereas the value of the weight may change. The dummy neuron is therefore an equivalent of the threshold synapse for all neurons in the next layer (see Fig. 2.2).

2.1.2. The EBP Algorithm

The Error Back-Propagation (EBP) algorithm adopts the well known backpropagation delta mle for the adaptation of weights. Using this method, the network learns to minimize the difference (delta) between the projections of the reference pattern and the required neural network response. The weight vector increment in step s+ 1 is expressed as follows:

(2.4)

where: s signifies the number of the training step.


In the course of training, the weight vector increment ~w requires a change in the direction of the negative gradient in the error space [226]. Therefore, the delta learning rule may be expressed by:

(205)

where 17 is the constant that deterrnines the rate of learningo The error E is defined in Eqo (206), and represents the squared error between the current value at the output of the network o and the desired response of the network d [226]:

(206)

where o and d signify K-element vectors, while K is the number of neurons in the output layero

The EBP algorithm of this method for a two-layer neural network may be described in a few consecutive steps [226]:

Step 1 The weights of matrices V and Ware initialized at small random valueso In the majority of cases, the weight values should be adjusted within the range from -1 to 10

Step 2 The cumulative cycle error E is set for 0 prior to learningo The goal of the training is to adjust the weights of the neural network in such a way that the value of the cumulative error drops below the arbitrarily selected threshold value of Emaxo Therefore, parameter E is increased by the value calculated when using expression (206) for each pattem from the training set.

Step 3 An element is selected from the training set. lt is recommended that vector x be selected at randomo At the same time, the required responses vector for network d gets updatedo

Step 4 The responses of the particular layers are calculated: y and o 0

Step 5 The error signal terms for respective layers are defined as in Eqo (207) and (208)0

öy=- VE(y)

ö0 =- VE(o)

for the hidden layer y

for the output 1ayer o

(207)

(208)

For the unipolar activation function expressions ~ and Öo, adopt the formulas as shown inEqo (209):

{öok =ok o(1-od(d: -ok)

Öoj = Yi 0 (1- Y) 0 LÖokwkj k=l

(209)

10 CHAPTER2

Step6 The V and W weight matrices are updated based on the formu1as

in Eq. (2.10):

(2.10)

Step 7 The network error is detennined for the given pattern,

whereupon this value is added to the value of the cumu1ative cycle error E. Step 8 1f it is not the last pattern in the training set, then a consecutive

object is selected at randorn and the training goes back to Step 3. At the same time,

the pattern that was used is rernoved frorn the training set and does not take further

part within the same cycle of training. Step 9 In the contrazy case, if it is the last elernent in the training set,

the curnu1ative error E is cornpared to the stop condition, an arbitrarily set

threshold value of Emax· 1f the neural network processes all objects in the training

set with a satisfactory error (E<Emax), the algorithm stops.

Step 10 1f E>Emax, the training cycle cornes to an end. The value of Eis reset to 0, the training set is reconstructed and another training cycle begins.

In order to aceeierate the convergence of the EBP training process, a

rnornentum rnethod is often applied by supplernenting the current weight

adjustment with a fraction of the rnost recent weight adjustment [226]. The

rnornentum term (MT) in the k+ lth iteration is expressed by the relationship:

(2.11)

where: a - user-de:fined positive rnornentum constant, typically frorn the range

0.1 to 0.8, Awk- increment ofweights in the kth step.

Thus, the final equations for the adjustment of weights V, and W with the

rnornentum terms are cornputed as below:

{vk+l = vk + TJ. öyxr

Wk+l =Wk +TJ·ÖoYT

2.1.3. Application of Pruning Weight Algorithms

(2.12)

The common problern of all neural networks is that the problern of selecting an

appropriate size of the structure is difficu1t to solve. When the net size is too srnall

cornpared to the training data quantity, the capacitywill overflow. In turn, when

the structure size is too large, the network has a tendency to store (rernernber)

SOME SELECTED SOFT COMPliTING TOOLS AND TECHNIQUES 11

data, and as a result the feasibility for generalization considerably diminishes. Two approaches are proposed to help solve this problem:

1. Evaluate the sensitivity of the cost function according to the weight of a neuron. Those weights with least influence on the cost function may be removed.

2. Introduce a punishment function for inefficient (superfluous) neural structure.

In both of these cases, a weight pruning algorithm results in the neglect of either weights or even a neuron [88]. The first solution seems to be more robust, however the tested methods (Optimal Brain Darnage - OBD, Optimal Brain Surgeon - OBS) are very time-consuming and thus ineffective from the training duration point of view. Even a simple evaluation of neuron influences requires additional training [88]. On the other band, methods with the punishment function are simple and quite efficient [82]. They may also be used to obtain a skeleton network structure during rule discovery.

In the case of the weight pruning algorithm, for the weight w if the cost

function E (2.6) is modified as follows:

2 , 1 ""' w(i E (W) = E(W)+-·r· LJ--2

2 .. 1+W;; 1,} ,

(2.13)

where r is a positive constant. The error back-propagation for the weight

adjustment is therefore as follows:

(2.14)

Since ANNs have grown to become a useful tool in many pattern recognition applications, this suggests that they may work weil in the musical signal domain, even precluding other approaches to the problern of musical instrument sound classification. The application of ANNs within the musical acoustics domain will be shown in the following sections.

2.2. Fuzzy Sets and Fuzzy Logic

The idea of vagueness (contrary to bi-valent logic) appeared at the end of the 19th century, and this term was formally applied to the fie1d of 1ogic in 1923 with work done by Russen. Furthermore, Polish logician Lukasiewicz first formulated multivalued logic in 1930 [96]. These research studies were carried out long

12 CHAPTER2

before the assumptions of fuzzy logic which Lofti A. Zadeh originally defined in 1965 [218], but multivalued logic was once more discovered thanks to bis work. Later, numerous scientists such as Kandel, Lee, Sugeno, Kosko, Yager, Y amakawa and others [28][86][96][198][217][219] worked on the idea and further developed it. Also lately, a treatise on the use of fuzzy sets, fuzzy logic, and possibility theory for dealing with imprecise information in database management systems appeared. Both theoretical aspects and implemented systems are discussed within the scope of this book [27]. Another book that deals with databases integrates artificial intelligence and database technology [221]. Since fuzzy logic theory and its applications are covered extensively in the literature, only the main features of this theory will be pointed out here.

Fuzzy set theory results from the need to describe complex phenomena or phenomena that are difficult to define and determine using a conventional rnathematical apparatus.

Suppose that X= {x} is a universe of discourse, i.e. the set of all possible

elements with respect to a fuzzy concept. Then a fuzzy subset A in X is a set of

orderedpairs {(x,.uA(x))}, where {x}EX and .UA :X ~[0,1] isthemembership

function of A; .UA(x) E [0,1] is the grade of membership of x in A. A fuzzy

variable has values which are expressed in natural language, and its value is defined by a membership function. Since the basic properties of Boolean theory arealso valid in fuzzy set theory, they will only be cited here briefly [86).

The union of two fuzzy sets A and B of a universe of discourse X, denoted as

A uB is defined as:

(2.15)

The intersection of two fuzzy sets A and B of a universe of discourse X, denoted

as AnB, is defined as:

(2.16)

The complement of a fuzzy set A of a universe of discourse X, denoted as -.A., is defined as:

.U-.A (x) = 1- .UA (x), Vx EX (2.17)

The above operations are illustrated in Fig. 2.3. As may be seen from Fig. 2.3, the fuzzy-set intersection is defined as the minimum of the fuzzy set pairs (the smaller of the two elements), the union is defined as the maximum, and the complement produces a reversal in order [96].

Another important notion of fuzzy sets is the size or cardinality of a set A. It is

defined as:


n

card A = L.UA(x;) (2.18) i=l

a b.

X

c. d.

J.l JJ

1.0

-,A

0 X X

Fig. 2.3. Basic operations in fuzzy theory: Fuzzy sets A and B (a), A uB (b), AnB (c), -,A (d)

2.2.1. Fuzzy Logic in the Control Technique

The primary factor making fuzzy logic seem to be predestined for applications in the field of control is the possibility for intuitively modeling linear and nonlinear control functions of optional complication. This capability approximates the decision making process of a machine to that of a human. Fuzzy-based systems also allow the description of functions with the use of conditional rules.

The literature concerning fuzzy logic is now weil developed, so only a short introduction to the principles of fuzzy logic control will be provided here.

The design of fuzzy controllers includes the collection of control rules. These rules consist of linguistic statements which link the controller inputs with their respective outputs. Assuming a two-input/one-output system, these rules have the following general structure:

R(r) : IF x is A~r) AND y is B;(r) THEN z is uf'l (2.19)

where: r=l,2,3, ... ,n, x, y, z - fuzzy variables,

14 CHAPTER2

~(r), B[r), u;r) - fuzzy subsets in the universe of discourses X, Y, and Z

respectively.

For the given mle base of a control system, the fuzzy contioller determines the rules to be fired for the specific input signal condition and then computes the effective control action. Applying inference operators sup-min or sup-prod (i.e. supreme-minimum, supreme-product) to the composition Operation results in generation ofthe control output [28].

In fuzzy set terminology, another notion is defined, namely the ''fuzzijication" operation. It can be performed by considering the crispy input values as "singletons" (fuzzy sets that have membership value of 1 for a given input value and 0 at other points) and taking the values of the set membership function at the respective data value [28]. Additionally, "defuzziflcation" operation can be performed by a nurober of methods of which center-of-gravity (centroid) and height methods are common. The centroid defuzzification method, determines the output crisp value U 0 from center of gravity of the output membership function

weighted by its height J.i.(U) (degree ofmembership) and may be described by the

following expression:

fu ·J.i.(U)dU Uo=~,.-----J J.i.(U)dU

(2.20)

The differences between conventional and approximate logic in control applications may be illustrated as follows [219]:

The input/output control signals relationy = f{x) can be determined by:

1. specifying the mathematical function, 2. declaring the tables of input and output values ( discrete set), 3. determining a set of fuzzy rules and membership functions.

The idea of fuzzy logic basically comes down to replacing the output function descriptions for al1 possible input states by creating a group of membership functions which represent a certain ranges or sets of input va1ues.

The process of creating a fuzzy logic application is usually comprised of five stages:

1. formulating the problern and identifying control signals which define the system behavior,

2. defining the inference rules, 3. designing the membership function for each variable, 4. rule based processing, 5. computing the values of control signals in the defuzzifying process.


The membership functions are standard and may be defined by stating three parameters:

Center Location - central value, Width, Type: inclusive/exclusive.

The meaning ofthese parameters is illustrated in Fig. 2.4 and 2.5.

a. }J

~ Width C~lral value

X

b.

' .• \ .. J.· X

Fig. 2.4. Shape of the membership function: inclusive type (a), exclusive type (b)

J.S

MAX

Fig. 2.5. Membership function parameters

X

As is depicted in Fig. 2.5, the membership function may have a triangular shape which enables a simplification of the process of computing its value. The degree of membership is in this case a simple function of the distance ac from the input value to the central value Xa (see Fig. 2.5). The distance ac is then subtracted from the maximum value of the membership function MAX. Hence the membership degree amounts to:

16

a) for a function ofthe inclusive type: f..l = MAX- abs(ac) ; f..l=O;

b) for a function ofthe exclusive type: J..l=MAX; f..l = MAX- J..lo + abs(ac) ; J..l=O;

when abs(ac) s width when abs(ac) > width

when abs(ac) > f..lo

CHAPTER2

when J.lo ;;:: abs(ac) ;;:: width when abs(ac) < width

Fuzzy processing is based on a set of inference mies, and there are several ways to create sets of rules. Most frequently, they are created heuristically rather than by using closed mathematical formulas, which is why this process is difficult to automate. Nonetheless, three directions can be formulated:

1. Representation of human knowledge and experience, 2. Usage of analytical bases, 3. Formulation of generalizations.

The inference process, based on fuzzy logic rules, may be illustrated as follows [80]:

as:

Let x1 and x2 be input variables, andy the output variable; Ru1e 1: IF x1 belongs to A11 AND x2 belongs to A12, TiffiN y belongs to B 1

Ru1e 2: IF x1 belongs toA21 AND x2 toA22 TiffiN ybelongs to B2

The values ofthe particular rules are defined in the formulas:

(2.21)

(2.22)

A graphic illustration of the inference process is depicted in Fig. 2. 6.

The actual output value that resu1ts from the completed inference is computed

where:

2

LW;Y; y==-"i=::...~--

Iwi i=l

(2.23)

(2.24)


A graphic illustration of the defuzzification process is depicted in Fig. 2. 7.

Fig. 2.6. Graphieillustration offuzzy logic rule-based operations

Jl

y

Fig. 2.7. Graphie illustration of the defuzzification phase. Computation of output value y is based on sets resulting from fulfillment of the rules

In some applications, a hybrid method comprised of both fuzzy and mathematical approaches may be used. As an example of such a method, the relational method introduced by Sugeno may be shown [28]. The principles of this method are shown in Fig. 2.8. There are two inputs in the exemplary system, namely: Width (W) and Height (H). The output (Js) is, in this case, a combination of rules sets and linear equations, because it is assumed that there are some regions in which the outputs may be expressed as linear functions of the inputs. Consequently, the IF part of the rule comprises a fuzzy expression, but the THEN portion is a linear combination of inputs and constant coefficients, the latter derived from analysis and tuned by observation. Rules 1 and 2 in Fig. 2.8 are as follows:

18 CHAPTER2

RULEI: IF WisMEDIUM ANDHisMEDIUMTHEN ls1=A01 +A11 W+A 21H RULE2: IF W is ZERO AND His MEDIUMTHEN ! 32=A02+A12W+A 22H

The last task to be perfonned in order to detennine the precise output is the defuzzification process, which in this case is a weighted average of linear equations. It is given that the relational method requires fewer rules and gives better accuracy than the rule base method [28].

Medium Medium

t t (W) (H)

(W) (H)

Fig. 2.8. Relational method illustration [28]

Since one of the uniqu~ applications of fuzzy logic techniques to musical acoustics [102] is the fuzzy control of a pipe organ, it is therefore mentioned in this work. The process of pipe organ activation, consisting of a musician depressing a key, the sound rising in a pipe and the reaction of a valve, is difficult to describe mathematically [31][97][98][99][100]. Additionally, since these processes are imprecise in nature, a typical microprocessor-based organ control system may therefore be replaced by a learning control system capable of modeling the non-linearities learned from exemplary entries and related decisions. Consequently, fuzzy logic techniques may be employed in a pipe organ control system. Such a system was engineered and applied to a pipe organ model within the research work done by the author in 1993-1994 under the support of the Committee for Scientific Research, Warsaw, Poland [102], and later on will be further described.

SOME SELECTED SOFT COMPliTING TOOLS AND TECHNIQUES 19

2.3. Rough Sets

The rough set theory and its basic concepts were proposed by Pawlak in the early 1980's [161], and provide an effective tool for extracting knowledge from database [12][162][163][167][188][189][191][221][224). Since then, many researchers have introduced rough set theory to different scientific domains [36][168][169][189][222][223]. This theory has also been successfully utilized in the field of acoustics [48][49][50) [53][54][104][120][125].

A fundamental principle of a rough set-based learning system is the need to discover redundancies and dependencies between the given features of a problern to be classified. Several important concepts include such notions as Upper Approximation, Lower Approximation and Boundary Region (Fig. 2.9) [161].

Lower Approximation

Fig. 2. 9. Basic structure of rough sets

A Universe U is defined as a collection of objects standing at the top of the rough set hierarchy. On the other hand, a basic entity is placed at the bottom of this hierarchy. Between them, the Approximation Space is defmed. The Approximation Space is partitioned by the minimum units, called equivalence classes, or also elementary sets. Lower and upper approximation definitions are based on the approximation space. Consequently, a rough set approximates a given concept from below and from above, using both lower and upper approximations. Three other properties of rough sets defined in terms of attribute values are shown in Fig. 2.10, namely: dependencies, reduct and core [35][161].

In Fig. 2.11, the relationship between the Universe and the Approximation Space is presented. The circles represent the objects in a universe. The grid over the circles corresponds to the Approximation Space, which is by definition a partitioned universe.

20

UNIVERSE-U

APPROXJ:l\11A TION SPACE- AS

LOWER AND UPPER APPROXIMATIONS-lA & UP ROUGH SET & DEPENEDE CffiS - RS & D

REDUCT - R CORE - C

Fig. 2.10. Hierarchy of concepts in rough sets

0 0

0 0 0

0 0

0 0 0 0 0 0

Fig. 2.11. Relationship between Universe andApproximation Space

CHAPTER2

Knowledge is represented in rough sets by a tuple SR = (U,P,D,Vp,VD,F) .

The variables are defined as follows: U is a finite collection of objects; P is a finite set of condition features or attributes; D is the decision attribute, arbitrarily

chosen by an expert; Vp is the union of all condition attributes in P; Vn

represents the domain of the decision attributes; and F is called a knowledge function. Simply speaking, the knowledge in rough set theory can be represented as a Decision Tab/e. A row in the Decision Tab/e represents an object in the Universe, and each column corresponds to an attribute in P . The decision attribute is always in the very last column. Such a way of presenting knowledge is shown in Tab. 2.1. A rough set as a learning algorithm can be used as an expert


system. As a result of such an approach, a set of rules in IF . . .. THEN form is obtained based on the Decision Table [161](162](163].

In Fig. 2.9, the Approximation Space S is divided by S into three discernibility regions: the positive region (dark gray), the boundary region (white) and the negative region (surrounding area - gray). Assurne that R c U x U is an equivalence relation on U which partitions U into many equivalence classes, R is called the indiscernibility relation. The Lower Approximation (ß.(S)) of S in S is denoted as the union of the elementary sets whose members are a11 in S , and

the Upper Approximation ( R(S)) is defined as the union of the elementary sets that have at least one member belonging to S . Resulting from these considerations, a standard set S can be approximated in space S by the pair

ß.(S),R(S), called the rough set [161][162].

Tab. 2.1. Knowledge base representation in the rough set theory

object/attribute Al A2 A3 ..... Am D (decision)

tl all al2 aB ..... alm dl (2 a2I a22 a23 ..... a2m d2

(3 a3I a32 a33 ..... a3m d3 ..... ..... ..... ..... ..... . .... . .... (n an! an2 an3 ..... anm dn

Rough set theocy integrates a generalized approach to data, and relies on experts' knowledge about the problems to be so1ved. The rough set method also provides an effective tool for extracting knowledge from databases. The first step in data analysis based on the rough set theocy is the creation of a knowledge-base, classifying objects and attributes within the created decision tables. Then, the knowledge discovecy process is initiated in order to remove some undesirable attributes, followed by the generalization of the concepts of desirable attributes. The fmal step, called reduct, is to analyze the data dependency in the reduced database and to fmd the minimal subset of attributes [161][162].

There are at least several algorithms or systems that realize knowledge discovecy using rough set-based principles [35][48][72][73][167][188][189][204] [222][223]. One of them is LERS, developed by Grzymala-Busse [204]. The LERS system uses two different approaches to rule induction, machine learning and knowledge acquisition, based on algorithms known as LEM1 and LEM2 (Learning from Examples Modules). The first algorithm is based on the global attribute covering approach, while the latter is local. LERS first checks the input data for consistency, after which lower and upper approximations are computed for evecy concept.

Another system based on rough set theocy is the experimental KDD system designed at the University of Madrid [204], called RSDM, which provides a generic data mining engine. This system evolved from a previously engineered

22 CHAP1ER2

system called RDM-SQL. The system kemel includes the following modules: User Communication Module, Working Area, Dynamic Operator Loader, Mining Data Module and DW Communication Module. Another algorithm, namely TRANCE, described by its author as a Tool for Rough Data Analysis, Classification and Oustering, generates rough models of data [204]. Thesemodels consist of a partition of the data set into a number of clusters, which are then labeled with decisions. The system uses either systematic or local search strategies. The ProbRoughsystem is used for inducing rules from data [204]. First, it tries to find an optimal partition of the condition attribute value space that minimizes the average misclassification cost, and then it induces the decision rules.

One of the best developed systems based on rough set theory is the ROSETT A software, which is a system for knowledge discovery and data mining [204]. The kemel of this system was developed by the Skowron's research group at the University of Warsaw. A Norwegian group within the framework of a European project supported the GUI (Graphical User Interface) of this system. The system consists of several algorithms, the main ones of which are as follows: preprocessing of data tables with missing values, filtering of reducts and rules according to specified evaluation criteria, classification of new objects, and computing rough set approximations. The ROSETT A system provides heuristics for search and approximations based on resampling techniques and genetic algorithms.

Another system which appeared recently is ROSE (Rough Set Data Explorer), developed at the Poznan University of Technology [204]. This system is a successor of the RoughDas and RoughClass systems which worked under the DOS operating system. ROSE is a modular program (Windows environment) which allows for performing standard and extended rough set-based analyses of data, extracting characteristic patterns from data, inducing decision rules from sets of learning examples, evaluating the discovered rules, etc. Additionally, it contains a module which offers both automatic and user-defined discretization. RSL (Rough Set Library), on the other hand, implemented at the Warsaw University of Technology, is intended as a keinel for any software implementation based on rough set theory [204]. It offers two possible applications which may be based on an RS library, one of which is an interpreter of queries for the information system and the other of which is an expert system with a knowledge acquisition module.

An environment for the synthesis and analysis of concurrent models based on rough set theory and Petri nets, ROSEPEN, was created by a research group from the Rzesz6w Pedagogical University [204]. This system was d~eloped using separate modules, one of which allows for handling data tables according to rough

settheory. The RoughFuzzyLab system was engineered by a scientific group from the San

Diego State University [204]. It uses two approaches for data mining and rule extraction: one is based on rough set theory (minimum concept description), and the other uses fuzzy methodology. The PRIMEROSE (Probabilistic Rule Induction Methods based on Rough Sets) generates probabilistic rules from databases [204]. The system is aim-oriented, specifically intended for use with

SO:ME SELECTED SOFT COMPUTING TOOLS AND TECHNIQUES 23

medical databases. It allows not only for inducing knowledge from data, but also provides estimation of probabilities and test statistics, cross-validation, etc.

KDD-R (Knowledge Discovery in Data using Rough Sets) is a system developed by Ziarko [204]. It is an extension of previously introduced systems called DataQuest and DataLogic. The basic underlying methodology behind this software-based system is rough set theory. The major components of the system consist of data preprocessing and rule search. One of the main features of this system is its ability to extract rules from data, both numerical and categorical. Also, a rough set-based rule induction algorithm was engineered at the Technical University of Gdansk. [48][54], the principles of which will be presented further on.

Other algorithms and systems based on rough set theory which work in different software environments and which were created at various universities for different purposes are also in existence, but they will be not cited here because they are still under development or its applications are known less widely.

Since the basis of rough sets is extensively covered in the literature, this has been an outline of only the generat concepts.

3. PREPROCESSING OF ACOUSTICAL DATA

This introductory chapter addresses the problern of preprocessing data in musical acoustics, as applied in this research work, along with the rationale for carrying out this analysis.

Due to the development of multimedia technology and digital signal transmission, there is rapid growth in the amount of audio data stored on various computer sites. Consequently, a problern is to fmd methods that allow one to effectively explore a huge collection of data in order to find needed information.

Specifically, the problern is to recognize objects within audio material. To do this, one can discem two different kinds of tasks. The first task is related to the automatic recognition of musical timbre, which means that the aim is to recognize the sounds of various musical instruments. The second task concems the recognition of musical phrases, which means trying to find a concrete musical piece based on a melody line. The difference between these two approaches lies in the kind of applied analysis [119], because when recognizing musical timbre one must perform acoustic analyses of a signal [108][111][129][133], while in the second case one must also consider the musicological analysis of the material [110][112][130][201]. The level of difficulty is similar in both cases. The first task concems the recognition of some dozens of musical instruments, but the analysis aiming to discem particular sounds is very difficult. The analysis of musical phrases is simpler because one may deal with very economical representations, namely the scores, but on the other hand the number of possible melodies is infinite. The most challenging problern is to follow the melody line performed by an instrument, based on the acoustical analysis of the sound produced by this instrument, and then to recognize a musical piece. Tasks related to the first approach are described in the next paragraphs.

Another subject addressed here is one of the still unsolved and vital problems in acoustics, which is to find a universal method of quality evaluation. For that purpose, subjective testing is often used. This concems mainly the subjective scaling of quality in room acoustics [7][25]{76][103][106][120][134][183][216], the assessment of audio equipment features [1][13][46][47][106][124][131], synthesized sound quality [56][171], and the quality evaluation of existing and newly created low-bit-rate-based algorithms [15][16][17][53][93][124][125]. Another problern related to the sound quality is the timbre perception, which is also discussed in the Iiterature [56][83][90][149][205]. There are many standard methods, including listening tests and statistical processing of the results obtained in these tests, which will be briefly reviewed in this chapter. These methods will

26 CHAP1ER3

be used as a preprocessing phase in the research experiments that are described later.

3.1. Musical Signal Representation

Musical sounds are an important and natural means of human communication and culture. During many epochs, much effort has been aimed at creating and developing various instruments used in music. Most musical instruments generate sound waves by means of vibrating strings or air columns. In order to describe the features of musical instruments, one must first decide on a division of instruments into categories (groups) and subcategories (subgroups), aimed at pointing out similarities and differences between instruments. There are various criteria to make this Separation possible, however it is often sufficient to Iimit this problern to only two criteria, namely the way an instrument produces sound and whether or not an instrument is based on Westernmusicalnotation [202]. Such exemplary division of musical instruments is shown in Tab. 3 .I. The included instruments are found in the contemporary symphony orchestra.

Tab. 3.1. Division of musical instruments into categories

Category Subcategory Contemporary symphony orchestra musical instruments (exam_l!les)

String (or Bow-~ violin, viola, cello, contrabass chordophone) Plucked lruup,. guitar, mandolin

Keyboard I piano, clavecin, clavichord Woodwind flute, piccolo, oboe, English horn,

Wind(or clarinet, bassoon, contra bassoon aerophone) Brass trumpet, French horn, trombone, tuba

Keyboard pipe organ, accordion Percussion ( or Determined so und timpani, celesta, bells, tubular bells, idiophone & pitch vibr3]_)_hone,_!Yl~hone, marimba membranophone) Undetermined drum set, cymbals, triangle, gong,

soundpitch castanets

With regard to the above given assumptions, the main acoustic features of musical instruments include: - musical scale, - dynamics, - timbre of sound, - time envelope of the sound, - sound radiation characteristics.

The musical scale is a set of sounds that an instrument is capable of producing. Dynamics defines all phenomena related to the intensity of sounds. The dynamic range can be described as the relation between the Ievel of a sound measured when

PREPROCESSING OF ACOUSTICAL DATA 27

played forte fortissimo and the level of a sound measured when played piano pianissimo. An interesting thing is that the dynamic range depends on the technique of playing and that it is different for continuous play (legato) and for single tones. This is illustrated in Fig. 3.1 [153]. In general, string instruments are only slightly quieter than wooden wind instruments and are about 1 OdB quieter than brass wind instruments.

Sound timbre is a feature that makes it possible to distinguish the sound of various instruments. First of all, it depends on the number, type and intensity of the component harmonics. Sounds that have few harmonics have a soft but dark sound, and those with a Iot of harmonics - especially with a prevailing number of high components - have a bright and sometimes even sharp sound. The timbre is also closely correlated to the shape of the time envelope and to the pitch of the sound. This enables a distinction between the sound registers of an instrument. Also, the influence of dynamics on timbre can be observed. Forstring instruments, this influence is only minor because components of more than 3kHz rise by only l.ldB when the Ievel of dynamics rises by ldB. For woodwind instruments the Ievel of these components rises by about 1.2-2.0dB and for brass instruments they can rise by as much as 3dB. An additional factor having influence on instrument timbre is performance technique, i.e. vibrato, pizzicato, martele, spiccato, etc. Higher harmonic components of brass instruments and of the flute, when played with vibrato, undergo amplitude modulation which Ieads to an audible change of both the dynamics and the timbre of the tone.

Time envelope is also of importance when analyzing musical sounds. This feature will be explained more thoroughly in the next paragraph.

dB

Lw~

Fig. 3.1. Dynamic ranges of some chosen musical instruments (Lw--+ acoustic

power Ievel with reference to 10-12 W/m2 [153])

The last feature to be mentioned here is the sound radiation characteristics. This feature depends greatly on the sound-radiating elements of a musical instrument. Although low-frequency sounds (below 500Hz) from most instruments radiate in all directions, higher-frequency components are increasingly direction-dependent.

28 CHAPTER3

This feature creates some difficulties, especially while recording single sounds generated by a particular musical instrument.

3.1.1. Parametrie Representation

The first task related to the automatic recognition of musical instrmnents consists of building a knowledge base in which information on musical sound patterns is to be included. However, because of the redundancy that characterizes acoustical signals, a parametrization process is needed which results in the creation of feature vectors. Therefore, the decision process can be based on a set of parameters that are characteristic for most musical instrmnent sounds. The parametric approach allows one to describe the sound as a path through a multidimensional space of timbres.

There are at least a few approaches to feature vector extraction from musical sounds. Problems in signal processing involve time-dependent data for which exact replication is almost impossible. However, much of this time-dependent data arises from physical phenomena which can be considered to be unebanging in their basic nature within periods of time. This kind of approach is often used in the analysis of musical sounds. It can be achieved by means of Fourier transform or pitch-synchronous wavelet transform. The latter mentioned method belongs to time-frequency signal analysis methods, the basis of which was the Gabor Transform [58][61][174]. Apartfrommost frequently used FFT transform, there are some other transforms that allow analysis in the frequency domain, such as Walsh-Hadamard transform, which involves analysis in terms of square waves of different frequencies, cosine transform, (modified cosine transform), McAulay & Quatieri algorithm [151]), etc. Furthermore, there exist spectral estimation methods, among others classical ones based on parametric methods. These methods refer to a variety of equivalent formulations of the problern of modeling the signal waveform, the differences underlying these formulations concern mostly the details of computations. In the Iiterature methods based on autocorrelation, covariance, maximum entropy formulation are o:ften cited. Algorithms known as Prony, Yale-Walker, Burg, Durbin, Pisarenko [89](150], etc., provide practical spectral signal estimation. The above cited methods are based on linear processes. They are effi.cient enough also when extending to the identification of adaptive dynamic models. This is because with suitable preprocessing, a non-linear problern may often be converted into a linear one. However, as the processes become more complex, a sufficiently correct non-linear input-output behavior is more difficult to obtain using linear methods. Lately, in the Iiterature on control system identification methods based on input-output models for non-linear systems, both deterministic and stochastic appeared. They are known as NARMAX (Non-linear AutoRegressive Moving Average with EXogenenaus input) and NARX (Non-linear AutoRegressive with EXogenenaus input) models [23][144].

A global non-linear system may be described in terms of [23] [ 144]:


y(t) = Fi [y(t -1), ... , y(t- n ),x(t- d), ... , x(t- d- n + 1),e(t -1), ... , e(t- n )] + e(t) y x e

where:

Fi [.] - non-linear function of the order i, x(t), y(t)- system input and output, ny.nx- order ofthe input and output signals, respectively, n, - order of the noise signal, d - time delay caused by the system, e(t) - prediction error. Ifthe system is single input and single output, the model becomes:

(3.1)

y(t) = F [y(t -1), ... , y(t- ny ), x(t- d), ... , x(t- d- nx + 1),e(t -1), ... , e(t- ne )] + e(t)

(3.2)

Although, the non-linear function Fi [·] is rarely known, using the a priori knowledge available its structure is assumed. This is the cmcial point of such an analysis, because the choice of this function determines the number of parameters describing the assumed model and hence the computational costs.

On the other hand, the NARX model given by the expression:

y(t) = Fi [y(t -1), ... , y(t- ny ), x(t- d), ... , x(t- d- nx + 1)] (3.3)

is a simplified case of the NARMAX model. In this case, an assumption is made that all terms related to e(t) disappeared.

In practical analysis considerations, above given equations are often expressed in a polynomial form. Systems that are characterized by non-linearities can be practically realized by means of non-linear IIR filters, therefore these models may be also used in signal analysis [23][144].

There are also parameters that are related to the time domain, but that are calculated on the basis of the frequency domain. The correlation parameters and the parameters based on cepstral analysis may be included in this group. A specific model of sound production underlies some of the analysis methods (i.e. Linear-Prediction Coding (LPC), cepstral analysis methods, etc.). It is therefore necessary to have some kind of knowledge about the instrument that produces the signal. The results of the convolution between the excitation source and the resonance structure results in formants in the signal spectrum (see Fig. 3.2) [58]. However, most instruments have more than two acoustic systems coupled together, so the deconvolution of the excitation and the resonance systems is not easy.

30

+ Excitalion

-Fig. 3.2. Resonance structure ofa sound

CHAPTER3

Moreover, any study on musical sounds should take into account not only the physical way in which sounds are generated, but also the subsequent effect on the listener. In the latter case, some features of the perceptual model of a human hearing process, such as subjective loudness impression or masking effects, might be taken into account.

Another method tobe mentioned is the analysis-by-synthesis approach. This approach in musical acoustics was actually introduced by Risset [58] in order to determine most important sound parameters. In this case, the resynthesis of a sound is made possible on the basis of calculated parameters. For example, harmonic-based representation of musical instrument tones for additive synthesis may be used as a sound parametrization. Although this data representation is usually very large, principal component analysis can be used to transform such data into a smaller set of orthogonal vectors with a minimallass of information [58]. The analysis-by-synthesis method is also a way of verifying whether a chosen parameter is of good quality. If it is possible to resynthesize a sound on the basis of parameters and it is perceived as close to a natural one, then it may be concluded that the parameters are appropriate.

It should be remernbered that the choice of parameters and their number are crucial to the effectiveness of automatic classification processes.

3.1.2. Time Domain Representation

Generally, the ADSR model (see Fig. 3.3) may represent musicalsignaltime domain characteristics, which is a linear approximation of the envelope of a musical sound This time-domain representation is depicted as consecutive sound phases- Attack, Decay, Sustain andRelease-that may be described in terms of

their energy and time relationships. The problern of locating the beginning of a sound is of importance, particularly

in the sound automatic recognition process. Two time-domain measures - energy and the so-called zero-crossing rate are often used in the speech domain for the purpose of discriminating a speech utterance from background noise. For a signal u=u{t), the zero-crossing function is defined as:


( ) {1- if there are signals u(t) that fullfil conditions (1 ), and (2 ), and (3 );

p u t = ' 0 - otherwise

(3o4)

where:

(1) u(t) 0 u(t-M)<O

(2) iu(t~>a and l(t-M~<a, where: a«;

(3Jiu(t~>a fort0 <t<t0 + Mandill 1 fsampling (305)

Parameter a ((.#(}) is an asswned thresholdo

The basic algorithms for the detennination of a zero-crossing require a comparison of signs of pairs of successive samples in asswned time intervalso The distribution of such intervals is defined by the functionR(t):

J

R(t) = L/(1-t) (306) j=l

where: /J(t)- Dirac's delta,j=l, 2, .. o J (J- nwnber of zero-crossings) and t1 -time interval between the pair of (j-1) andj (in segment 1), additionally:

(307)

Silence ~tt.ack iDecay Steady-state Release [Tl:me] '

Figo 3.3 0 Linear approximation of musical signal envelope

It should be remernbered that the starting transients are the most important phase for the subjective recognition of musical soundso It has been shown in nwnerous experiments that when the attack phase is removed from a sound, it is

32 CHAPTER3

no Ionger recognizable and, moreover, that some instrument sounds (trumpet and violin, for example) may not be distinguished from one another. In order to represent transient states, some parameters should be introduced. Krimphoff et al. introduced the rise time on a logarithmic scale (LTM), defined as [135]:

LTM = log(tmax - tthreshold) (3.8)

where: tmax -time value, when amplitude reaches the maximum value of the

RMS, tthreshold -time value corresponding to the minimum amplitude of signal threshold perception (see Fig. 3.3).

The signallevel versus time is defined as:

T t+-

2

I ( t) = a J u 2 (r )dr T t--2

where: T- width ofthe time window, a- normalization coefficient.

(3.9)

Another parameter represents the amplitude envelope ( or instantaneous amplitude ), described by the following expression:

(3.10)

where: u(t)- Hilbert Transform ofthe signal u(t), calculated as:

u(t) = _!_ J<X) u(r) dr 1r t-r

(3.11)

-<X)

A parameter that is directly extracted from the time signal structure is the proposed transient midpoint, t0 ( see Fig. 3 .4) [1 01 ][ 13 3].

The value of t0 is calculated according to the formula:

M 1 a+b !o =-=--

M0 2 (3.12)

where: M1 is the first-order statistical moment:


ctJ

M1 = Jtf(t)dt=(b 2 -a2~ (3.13) -ctJ

In order to nonnalize, the signal energy M0 is calculated according to the following equation:

ctJ

M 0 = J J(t )dt = (b - a )~ (3.14) -ctJ

where: h is an energy increment versus time.

The envelope rising time may be found by the calculation of the second central moment:

t. =b-a =t2Mz ns M

0

(3.15)

where:

(3.16) -ctJ

E

Fig. 3.4. Time enve1ope ofthe simplified transient model: a- transient starting point, b - transient ending point, M0 - energy of the steady-state, t0 - transient midpoint

There are two more phases that should be taken into account, namely the phase of energy decreasing from the local maximum and the subsequent phase of energy increasing from the local minimum to the energy ofthe steady-state (see Fig. 3.3).

The essential factor that differentiates the ideal signal model from real sound recordings is the amplitude variation of the steady-state phase. As the amplitude of the musical signal varies with time, the signal energy provides a convenient representation that reflects these amplitude variations. Variances representing

34 CHAPTER3

these fluctuations should be also considered, thus these two parameters may be included in the feature vector.

3.1.3. Spectral Parameters

The feature vectors containing time domain parameters should be completed by adding the spectral properties. On the basis of the sound spectrum, many additional parameters may be determined. However, before defining spectral parameters, some spectrum estimation methods will be reviewed in order to show that~ in some cases such methods are more accurate than FFT analysis, and hence, they may be useful in the parametrization process.

Generally, a large dass of parametric methods fall under the category of spectral estimation. Therefore, some chosen methods of spectral estimation that are based on power series models are reviewed in this study, namely Autoregressive (AR), Moving Average (MA), and Autoregressive - Moving Average (ARMA) models. These methods are often described in terms of ZerosPoles approximations, i.e. the MA model belongs to "all-zero" methods, while AR belongs to "all-poles". Some examples of analyses using AR, MA, and ARMA processes will be given in order to show that these methods may be useful for the analysis of spectra of musical sounds [89] [150].

Spectral estimation is a three-fold method. First, the appropriate model is chosen. Then, model parameters are computed. Finally, in the third phase, computed model parameters provide coefficients for the evaluation of the PSD (Power Spectral Density) function.

Let u[n] be the input and x[n] be the output signals. Thesesignal sequences are related by following expression:

p q

x[n] "'-L:a[k]x[n -k] + Lb[k]u[n -k] (3.17) k=! k=O

where: a, b are model parameters and a pair (p,q) represents the order of the model. Eq. 3.17 is known as the ARMA model, and the basic filter structure realizing this process is shown in Fig. 3.5.

The transmitance H(z) between u[n] and x[n] for the ARMA process is

defined as:

H(z) = B(z) A(z)

(3.18)

where: A(z)- the z-transform of the AR part of the process,

p

A(z) = La[k]z-k, B(z)- the z-transform of the MA part of the process, k=O


q

B(z) = Lb[k]z-k and a[k], b[k] are the coefficients of the autoregression k=O

function and the moving average, respectively. It is assumed that A(z) can have poles that lie inside the unit z-circle in order to guarantee the stability of the system.

u[n]

l--------.---x[n]

Fig. 3.5. Filter structure modeling ARMA process

It is known that z-transform ofthe autocorrelation function Px:x(z) is equal to

the power spectrum P xx (f ) , on the condition that z = exp(j27if ) for

_l.<f<l 2- -2 °

If all coefficients a[k] except a[O] = 1 equal zero in the ARMA process, then:

q

x[n] = ~)[k]u[n -k] (3.19) k=O

and this process is known as the MA of the order q, and its power spectrum is given as [150]:

(3.20)

on the condition that u[n] is a white noise sequence with mean value equal 0

and variance a 2 is equal to the white noise power density.

On the other band, if all coefficients b[k] except b[O] = 1 equal zero in the ARMA process, then:

36 CHAPTER3

p

x[n] = ~a[k]x[n- k] +u[n] (3.21) k=1

and this process is known as the AR (autoregression model) of the order p, and its power spectrum is:

(3.22)

Basedon the Wiener-Khinchin theorem which says that the Fourier transform of the autocorrelation is equal to the power spectrum, it is possible to express the power spectrum ofthe MA process as follows:

PMA(f) = ~rxx[k]exp(-j27ifk) (3.23) k=-q

The same analogy can be applied to the AR and ARMA processes. It is known that under the condition that the power spectrum is infinite, then for

the ARMA(p,q) processthe AR(p) and MA(q) equivalent models do exist.

Provided h[k] == 0 for k<O, then the autocorrelation fimction rxx[k]for the

ARMA process is as follows:

p q-k

- ~a[/]r.u[k-/]+cr2 ~h*[/]b[/+k] for k=O,l, ... ,q 1=1 1=0 (3.24) p

- ~a[/]r.u[k -/] for k ~ q+l 1=1

where: h[l] is actually the impulse response ofthe system with transfer filllction H(z).

Providing that b[/] = o[/], then the autocorrelation function for the AR process

is: p

rxx[k] =-~a[/]r.u[k -/] + cr2h*[-k] (3.25) 1=1

Additionally,ifh*[-k]=O for k>Oandh*[O]=[limH(z)]*=l, then: Z~<Xl

PREPROCESSING OF ACOUSTICAL DATA

rx,Jk]=

p

- Ia[/]r;o;[k-/] 1=1

p

-L a[l]r ;o;[-/] + cr2

1=1

37

for k ~ 1

(3.26)

for k =0

The above equations are known as Yule-Walker's equations. For computational purposes, the above equations are often given in matrix form:

(3.27)

Correspondingly, for the MA process, when a[/] = 8[/] and h[/] = b[/], the autocorrelation function is as follows:

- {cr2 ~ b *[/]b[/ + k] rn[k]- I=O

0

for k = O,l, ... ,q (3.28)

for k~ q+1

It is known from literatUTe that the AR and equivalent ARMA models provide an accurate representation for under1ying power spectra which have sharp spectral features [89]. Therefore, most of the carried out musical sound analyses aimed at testing algorithms that are based on AR and ARMA processes.

In order to estimate the power spectral density in the AR model, estimation methods other than the autocqrrelation method were also used in this study, namely: covariance, modified covariance, Burg's method, RMLE (Recursive Maximum Likelihood Estimation) method, etc. The tested method based on the MA modelwas the Durbin's algorithm. For the ARMA process, on the other hand, such methods as MYWE (Modified Yale-Walker's Equations) and LSMYWE (Least Squares Modified Yale-Walker's Equations) were used.

It should be remernbered that both the AR and MA processes may be treated as specific cases of the ARMA process. Starting from the ARMA process, it is possible to estimate the power spectra of these processes by assuming that the order of the MA model, denoted as q, is equal to 0 in the AR process, while the order of the AR model, expressed as p, equals 0 in the MA process [89][150] [172].

38 CHAPTER3

It should be noted that the choice of the frame length (N) and the determination of the model order are a very important consideration in implementation. Clearly, for the autocorrelation method N should be on the order of several pitch periods in order to ensure reliable results. In order to effectively evaluate the model order, it is necessary to use one of commonly used techniques and criteria. Basically, the model order is assumed on the basis ofthe computed prediction error power. First of all, a rninimum of the so-called Final Prediction Error (FPE) function defined as follows:

where: Pk power)

FPE(k) == N +k Pk N-k

(3.29)

- is the variance of the white noise process (prediction error

serves as such a criterion. Another criterion, known in the Iiterature as Akaike Information Criterion (AIC) expressed in the Eq. 3.30 [89]:

AIC(k) == Nlnpk +2k (3.30)

is often used. The chosen model order is the computed rninimum of the expression 3.30. The expression 2k in Eq. 3.30 is often substituted by the factor k ·lnN due to the fact that computed order is too high, providing a big value of N. One may use also Bayesian Information Criterion (BIC) as a measure of the goodness-of-fit ofthe model.

Below, exemplary results of analyses obtained on the basis of some parametric methods that were implemented algorithrnically in the Sound Engineering Department ofthe Technical University of Gdaitsk aretobe shown.

In Fig. 3.6-3.8, three spectral analyses ofthe same violin sound (b5, 980.13 Hz) are shown. These analyses were performed using the autocorrelation method. As is seen, the number of samples used in the analyses (N) in:fluences the quality of the spectral analysis. At the same time, a decrease in the number of samples from 512 to 256 causes better resolution.

Theoretically, ARMA models being zero-pole signal representation are more accurate than AR or MA models, however in practical musical sound analysis it rnight be proven that pole-representation (AR models) may be more useful, because it represents the resonant structure of a musical instrument body. 1t is especially visible, when comparing consecutive analyses performed using different methods (see Fig. 3.6-3.15). On the basis ofperformed analyses, it may be concluded that spectral approximation obtained on the basis of the AR model, regardless of the method of analysis, is more accurate than when ARMA or MA models were used. The crucial point of analysis is, however, the assignment of the order of the model. In order to compare spectra that were obtained using parametric methods (3.6-3.15) with the corresponding one obtained on the basis of


the FFf transfonn, an exemplary analysis of the same violin sound is shown in Fig. 3.16.

. ...... ·-. . ' . . . . . . . . -&-yk·· .· ..... ; .... ; ... ; ... ; .... ; .... ; ..... • • ... ; .; . . . . . . . . . . -1- ...... - -: .... } .... : .... ·j .... f . . ·]· . ·: . ~- ... -:. . j -·· ......... -:. . . ;. .... :- ... -: .. ;. -:· . . -: ... :- ... -: . : -•o ......... -~ .... ~ . . . : .... -~ .... ~ .... j. . . .: . . . ~- .... ~ .... ~

~~ .•. l.~.-Fig. 3.6. Spectrum of a violin sound b5, autocorrelation method (AR model), p

= 30, N= 128

. . . , . . . . L . . . -~ . . . . i . . - . . . . . .. . . . . - . . .............•.

: : : : : : : :

·~······· : .. :1:::: :! :::: {::: :::::: :! :::: ;: ::: :1:::: I

..•....•. o ..•..• L .. • ...•. )o .. · ... ·.~.o····-L=~o . .!1.o ............ & .o

Fig. 3.7. Spectrum of a violin sound b5, autocorrelation method (AR model), p = 30,N= 256

Fig. 3.8. Spectrum of a violin sound b5, autocorrelation method (AR model), p = 30, N= 512

=:= r . . . ;~~~ .. ·78 • . ""- I ............•..........•....... ~ .. ;

o .a..o a.o •-o 4.0 a.o -..o ?.o e.o 9.o .a.o.o s...a..o

Fig. 3.9. Spectrum of a violin sound b5, covariance method (AR model), p = 30,N= 512

40

··········:·············

-:::::: :L--.:::

............ . '

............. 7,0

.. ·:· ... :

........

- . . . . . . .

CHAPTER3

--u-----...... ... ~-.. 0 ···-~-~. 0

Fig. 3.10. Spectrum of a violin sound b5, modified covariance method (AR model), p = 28, N = 512

........ ···················· .a..o a.a a.o ... o •·o •-o

Fig. 3 .11. Spectrum of a violin sound b5, Burg's method (AR model), p = 30, N = 512

Fig. 3.12. Spectrumofa violin sound b5, RMLE method (ARmodel),p = 30, N = 512

. . . . . -~ : . . : !: . . .

. ....

-·-~·-:-: ~~ •...•.•••. :, : •• :': .• :l-l_____L-fl~-J

·········---- ...................... ............. . ... . a.o a.a •-o ... o •-a a.o.a JLa..o

Fig. 3.13. Spectrum of a violin sound b5, Durbin's method (MA model), q =

30, N= 512


; ... ' ..

' ·:

. . !

·~ ..

-i•

'·

' . ·I· •

l' !·.

. :: t ·:j: ..... ·r: : .

............................... J±:tt~i~j .A..O A.D

41

Fig. 3 .14. Spectrum of a violin sound b5, MYWE method (ARMA model), p = 28,q=2

. - . . : :

. . •:•. '. ':'. '

• •i • : • : ~ : : : : ~ : : : : : : : .

.......................... . .. . . . . "

. -· · -' o' o 0 M' 0 0 0 U 0 0

'. · i· . . . i ..

' ................ ;_~ ---· ····· ···:1 .. · .. ........... : ............. :.;.-__ .. _ . . ............... i ..... ....... .. .: ............ ,..: ............ .. .l. .............. : ............... J

o .1. . 0 • . o 2.0 .. . o

Fig. 3.15. Spectrum of a violin sound b5, LSMYWE method (ARMA model), p = 28, q= 2

Fig. 3.16. FFf analysis ofa violin sound b5, Hanning window

On the basis ofthe perfonned analyses, it may be observed that: - presented methods accurately estimate the pitch of the analyzed sound (in the

presented example it was equal to 980.13 Hz); - overall shape of the sound spectrum is preserved in most used methods; - although the first hannonic is clearly in evidence in all plots, it can be seen that

both ARMA and MA model-based methods reveal fewer peaks than in the adequate FFf analysis;

- the parametric methods based on the AR model give very sharp and high maxima for the spectrum partials of high energy as compared with the adequate

42 CHAPTER3

FFf analysis, especially in the case of the low partials, but contrarily, higher spectrum partials of less energy are often disregarded and are not shown in the analysis;

- assumed value of the model order in all parametric methods is of high importance;

- a more general conclusion concerns the number of sound samples (N); namely, for N<512, it is more convenient to use parametric methods because they are more accurate than the FFf analysis, while for N>512, the FFf analysis gives better spectnnn estimation.

Additionally, in Fig. 3.17 and 3.18 a direct comparison ofthe spectra using FFf transform with this obtained on the basis of the AR model (modified covariance method) is shown for another violin sound, name1y c6 (fundamental frequency equals 1047.8Hz). As is seen, the parametric representation Ieads to a very good estimation of subsequent sound harmonics, however the signal to noise ratio (SNR) is an important factor influencing the quality of the spectral estimation. When comparing two violin sounds (Fig. 3.10 - b5 and Fig. 3.17 - c6) obtained in parametric-based spectral analysis it may be concluded, that both the model order (p,q) and number of sound samples (N) should be carefully chosen in the analysis. For the example in Fig. 3.17 most sound harmonics are better resolved than in the case ofthe b5 sound analysis.

. . ~

. ~ I

'! . ·l .. : 1:

. '

i . i

.. . - -4-.G -... v •·o .. . o .a.o . o .a..a. . o

Fig. 3.17. Spectrum of a violin sound c6, modified covariance method (AR model), p = 28, N = 512.

Fig. 3.18. FFf analysis of a violin sound c6, Hanning window


Moreover, as is seen from the above spectral estimation analysis, such methods may be used in the parametrization process, however, due to the high computational complexity, they make difficult an automatic analysis of musical sounds which is a real disadvantage while dealing with a musical sound database. It should be pointed out that the whole process starting from sound editing, through parametrization, and up to the classification process should be automatized. Additionally, parametric methods may cause uncontrolled loss of information. Therefore, in further analysis only parameters derived from the FFI'based analysis will discussed.

The spectrum components midpoint value fm may be calculated using the following formula:

!.",. I

ft·E(/)dJ Li·E; fm= 0 =rf i=! (3.31) fmcu. I

JE(/)dJ LEi 0 i=!

where: r1 - parameter characterizing the resolution of the FFI' analysis, E; ~ energy of ith component for the frequency equal to r1, !mox - upper Iimit of the analyzed frequencyband, I- highest spectral component (I ~!maxi r1 ) (BoAmst).

Another parameter that is often used in the speech processing domain is the mth order spectral moment. It may be defined as follows [ 199]:

00

M(m) = :LIG(k)j[J,,r (3.32) k

where: fk - is a center frequency of the kth frame used in the spectral analysis. Values of fk may be calculated on the basis ofBq. (3.33), in wirich the resolution ( Af ) of spectral analysis is used:

fk = (k-l)Af + Af 2

(3.33)

The parameter defined by Bq. (3.32) may be interpreted physically. For example, on the basis of the 0-order spectral moment, the energy concentration in the low frequencies may be exposed. Also, this parameter is often used as a normalization coefficient for the higher order spectral moments. On the other hand, the 1st order spectral moment may be interpreted as spectral centroid coefficients. G(k) in Bq. (3.32) are dependent on the window function that was

44 CHAP1ER3

applied to the analysis. In the case where the spectral domain is represented by components of amplitudes Ak and frequencies which are nth multiplies of the

fundamental, then the above shown relationship (3.33) should be modified according to Eq. (3 .34 ). Therefore, the mth moment may be calculated as follows:

n

M(m) = LAk(k)m (3.34) k=l

and the spectral centroid (Brightness) may be defined as follows:

N N

B= In·An/LAn (3.35) n=l n=l

where: An - amplitude of the nth harmonic, N- total number of harmonics.

There are other parameters which describe the shape of the spectrum in the steady-state phase, such as the even (h •• ) and odd (hodd) harmonic content in the signal spectrum:

h = ev

M = entier(N/2);

2

2 2 2 A2 +A2 +A2 + .. .

Af +Ai +Ai+ ... - ~~A; (3.36)

and contents of odd harmonics in the spectnun, excluding the fundamental:

AJ +AJ +Ai+ .. . A{ +A~ +AJ + ... -

(3.37)

L = entier(N/2 + 1).

where: An, N- as before.

Another parameter derived from the frequency domain which is often used for the purpose of estimation of auditory masking effects seems of importance [225],


namely the Spectral Flatness Measure ( SFM ). Since audio signal varies in successive frames of the analysis, the SFM parameter rnay thus be used as a measure of the tonal (clear maxima) or noiselike (flat spectrum) character of the signal, expressed as the ratio of the geometric to arithmetic mean of the power density function, defined as follows [84][225]:

1/N

[fi P(eiz;k )]2 k=l

SFM =10log10 ,::._----=-.,--i 1 N/2 .2d

--I:Pc/"N) N 12 k=I

(3.38)

.2nk

where: P(/11) - is the spectral power density function calculated on the basis ofthe N-point Fast Fourier Transform Algorithm.

On the basis of the SFM value, an additional parameter is formulated, namely a coefficient oftonality a that is expressed as:

a =min( SFM ,1) (3.39) SFMmax

where: a = 1 for SFM = SFM max = -60dB (sine wave), and a 0 for SFM = Od~ (white-noise signal).

Formants are parameters widely used in speech analysis which indicate local maxi.rna of the spectrum. It is obvious that their physical intetpretation in musical acoustics corresponds to resonances of the instrument body. Precise tracking of the formant frequency is not easy. However, using amplitudes of discrete spectrum A1,A2 ,A3,A4 and corresponding frequencies fi,/2,/3,/4 it is possible

1\

to calculate the approximate formant frequency as F or F [199]:

F = Alfi +Az/2 +A3f3 AI +A2 +A3

F = A2/2 + Ad3 + ~~4 Az +A3 +A4

(3.40)

(3.41)

For a simplified formant tracking algorithm, the following assumptions are to be made: the formant is located among the neighboring components: (k-p), (k+p), if the following conditions are fulfilled [ 111]:

46 CHAPTER3

10 values of component amplitudes are bigger than the assumed threshold value

Athreshold : (A(k) ~ A(k- p) AA(k) ~ A(k + p)) ~ Athreshold (3042)

where: p defines the demanded width ofthe formant.

20 df, defined as the difference between the spectral centroid and geometrical center, taken with the minus sign, is bigger than the assumed threshold dfthreshold 0

The formant tracking algorithm is presented in Figo 3 019 0

Assumptions:

4(;. ..... ""' ~old

- width ofthe analyzing window- w

Calculat.ing:

X r spectral centroid x2 = geometrical center

df=xrx 2

A k

(formant)

p++

Figo 3 019 0 Formant tracking algorithm


The presented algorithm was applied by the author in order to extract formant

frequencies in musical sounds [111]. The threshold value of Athreshold may be expressed in terms of the amplitude mean value or of the RMS value, as defined in Eq. (3.43):

RMS=~t,A/ (3.43)

Resulting from a performed test, an example of this analysis is presented in Fig. 3.20; namely, spotted maximuro for a violin spectrum. As is seen in Fig. 3.20, there is a maximum found at the frequency equal to 1526Hz.

It should be mentioned that formants, i.e. enhancements of harmonics in certain fixed frequency intervals, remain invariable within the chromatic scale of instrument, whereas spectra of individual tones may vary considerably from one note to another. Thus this feature is specific for a given instrument.

Another criterion (IRR) introduced by Krimphoff · and al. corresponds to the standard deviation of time-averaged harmonic amplitudes from a spectral envelope, and is called "spectral flux" or "spectral fine structure" [135):

(3.44)

In the literature, an approach to the estimation of the sound spectral domain based on polynomials may be found. This approach seems to be especially justified in the case of a rieb sound spectrum. The applied approximation is based on minimizing the mean-square error in the range of the analyzed spectrum by using the following proposed relation [87)[105)[109):

while:

where: E i N A(i) -

N

E = L (20 ·log10 A(i) -W1 (log2 i))2

i=l

I

~ (log2 i) = La 1 ·(log2 i) 1

j=O

mean-square error, nurober ofthe harmonic, i=l, 2, ... N, nurober of the highest harmonic, value of the amplitude of ith component,

W1(log2 i) - value ofthe polynomial for ith component,

a 1 jth term of the polynomial,

(3.45)

(3.46)

48

j I

number of the consecutive term of the po1ynomial, order of the po1ynomial .

. . . . a.s .......... .

: : : : : : ...... o.• ..................... ····· -~-. : . -~· . ~. -~-.: . ·:·

. . . : : : : : :

: : : : : : : 0.3 ... . . . ... . . . . . . ·:·.:. ·:·.:. ·:· ·:. ·:· .......... ..

; ~ ; ; ; ; o .• ............ .: .. : .. : .. : .. : .. : .. : : .. : .. : .. : .. : . ....

. . .

o .• ' ...... t~L: . .J ....... : ...... .r.· .. : ...... J:. ..... ! ..... ·:: .. :.t.: ... :: .. : ....... I ... -.! .. ·.:.• ...... :.··· 0 &.oa.aO,Q4,Q&,06,0'ii".OQ,OV,O&O • . u .. UJ, &0. L4 • .1.S,.l"-,J.?.J.O,J.O,ao.

k ...

.l:S26.0& Hz. AMaM:=O,CI68. df::::9.t18B Hl

Fig. 3 .20. Local maximum of a vio1in spectrum

CHAP1ER3

Computations which minimize the error are performed by consecutive Substitution of the order of the po1ynomial (/=1,2 ... etc.), successive1y obtaining coefficients aJ.az,a3. ..... Basedon formula (3.45), an approximation is performed in the spectrum domain, presented in the log/log scale, which causes the consecutive1y computed coefficients ai to have units respective1y dB/octave, dB/octavl, dB/octave3, etc. These coefficients have a c1ear physical interpretation, e.g. the first defines the decay of higher harmonics in the spectrum, whereas the second indicates a gain or a loss of the middle part of the spectrum in relation to its lower or higher parts. By raising the approximation order, more coefficients are obtained which describe more precise1y the spectrum of the sound. The minimum number of po1ynomial coefficients approximating the envelope spectrum may be determined in listening tests.

An illustration of such an approach is shown in Fig. 3.21. It was proved based both on the mean-square error optimization and listening tests that the 5th order of the approximatlng polynomial may be considered as sufficient in the cases of both shown instruments .

................... n __ _ .. ,

Fig. 3 .21. Sound spectra approximated by the 5th order po1ynomial


3.1.4. Time-Frequency Representation

One of most popular methods in the domain of signal processing is timefrequency signal analysis. This is due to the fact that signal processing becomes an important tool in domains such as seismology, medicine, speech, vibration acoustics, etc.) [37][61][74][148][154][208][214]. Most real signals that are analyzed in practice are of a non-stationary character, that is why their conventional approximation by means of stationary signals using classical frequency estimation methods is not faithful enough and may cause even gross errors.

Originally, the time-frequency analysis was proposed by Gabor. He showed that a signal apart from time and frequency representation can have a twodimensional represenation. He proposed a technique that Ieads to the frequency analysis by partitioning signal into fragments and applying a window function. The performed convolution process used a bell-shaped time envelope, generated by the Gaussian method [58]:

( ) 1 -x2 /2dx px = r;;-e v2JC

(3.47)

Gabor's time-frequency signal analysis method was reexamined by Grossmann and Morlet, and lately by Kronland-Martinet and provide the basis of the wavelet transform [61][74].

Wavelet transformation is a powerful tool for time-frequency signal analysis [39][70]. This transform is especially useful when it is necessary to characterize transient phenomena. By using adequately dilated or contracted copies of a mother function (see Fig. 3.22 and 3.23), it enables the analysis of various frequency bands of the given signal with different resolutions. This solves the problern of obtaining a high resolution simultaneously in the time- and frequency-domains. The computational cost of obtaining a representation of a signal using wavelets transformations, though, is quite significant [58][160].

The elementary wavelet ftmctions gb,a(t) that are subjected to a change of scale are copies of a wavelet mother function g(t):

gba(t)=-·g-1 (t-b) ' a a

(3.48)

where: b-is any real number, and a- is the rescaling coefficient, a>O.

The frequency localization is given by the Fourier transform of the gb,a(t) function:

kb,a(m) = g(a · aJ). ef-w·b (3.49)

50 CHAPTER3

where: g b,a ( co) is the Fourier transform of the fimction g b,a ( co) .

The localization depends on the parameter a. The resulting decomposition will consequently be at l:!.co I co = constant . For this reason, wavelets can be interpreted as impulse responses of constant Q-filters.

Assuming that a signal is composed of a set of elementary functions, the wavelet transform is thus given by:

where: the bar denotes complex conjugation.

The Fourier transform tends to decompose an arbitrary signal into harmonic components, whereas the wavelet transform allows free choice of elementary fimction [58]. This feature ofthe wavelet transform seems of importance because it is possible to carry out musical sound analyses that are specific for a given instrument (i.e. mother function derived from the analysis of an instrument structure), therefore using differentiated mother functions.

g(t)

t

Fig. 3.22. Mother wavelet fimction

a=l _,

t

Fig. 3.23. Elementary wavelet scaled with lal<l

On the basis of the wavelet transform, it is possible to define certain parameters, such as distribution of energy or measure of discontinuities.

A wavelet transform may be implemented as a bank of filters that decompose a signal into multiple signal bands. lt separates and retains signal features in one or a few of these subbands. Thus, the main advantage of using the wavelet transform is that signal features can be easily extracted. In many cases, a wavelet transform outperforms the conventional FFf transform when it comes to feature extraction.


3.1.5. Special Parameters

It is convenient to correlate time-related properties with those of the frequencydomain. The group of parameters called the Tristimulus graphically shows the time-dependent behavior of musical timbre [166). In the Tristimulus method, 1oudness values measured at 5ms intervals are converted into three coordinates, based on the 1oudnesses of (1) the fundamental (N1}, (2) the group containing partials from 2 to 4 (N2), and (3) the group containing partials from 5 to N (N3},

where N is the highest significant partial. The values of (N2) and (N3), are calculated according to the formula:

N2(3) = 0.85N max +0.15N; (3.51)

where: Nmax- component having the maximum loudness within the given group of harmonics.

Then, parameters x, y, z are derived from the following formulae:

N x=-3. N'

where:

N y= tJ; N

z=-1 N

(3.52)

(3.53)

This procedure allows a simple graph to be drawn that shows the timedependent behavior of the starting transients with relation to the steady-state.

Furthermore, harmonic energy or amplitude values may be taken into account instead of loudness for classification purposes [107][111][116]. Therefore, three parameters are extracted for the above defined spectrum subbands, namely the first - T1, second - T2, and third - T3, modified Tristimulus parameters according to the formula:

N

y; =A1 (LA; (3.54) n=!

where: An, N- defined as before.

- the second modified Tristimulus parameter:

4 N

T2 =LA; /LA; (3.55) n=2 n=!

52 CHAPTER3

- the third modified Tristimulus parameter:

N N

T3 =LA~ ILA~ (3.56) n=5 n=l

Additional1y, the following condition is to be imposed to the above defined parameters:

(3.57)

As most of the presented parameters do not have stab1e values within the chromatic scale of an instrument, the applicabi1ity of other criteria has been verified, such as the me1-cepstrum coefficients (MCC) defined by the following expression [ 111]:

n

Wc[k] =LEi cos(,. (i -0.5) ·k) i=l n

(3.58)

where: Wc [ k] - kth cepstrum coefficient, Ei - energy of ith harmonic expressed

in [dB].

Also, parameters that are re1ated to the frequency of the nth harmonic -normalized frequency deviation and inharmonicity, were examined [14]. The first factor is defmed in the following formula:

!:,.fn(t) = fn(t) _1 nfi nfi

(3.59)

where: f,.- frequency of nth harmonic,.fi -fundamental frequency; The inharmonicity factor describing the degree to which a sound is not

perfectly harmonic is given in Eq. 3.60:

where:

inh = fn(t)-nfc(t) nfi

(3.60)


As is seen from the above equation, an additional parameter, called the composite weighted-average frequency deviation, is defined. This is because it often happens in practice that the fundamental is much weaker than other harmonics. Therefore, the inharmonicity factor is determined for the five lowest spectrum partials as a frequency centroid [ 14].

A convenient way to disp1ay certain properties of a signal is by using its statistical representation [176]. For this purpose, autocorrelation (rAn,rFn) and cross-corre1ation functions (rAmn,rFmn) are often defined [6]:

M-k-! rAn(k)= [(M -k)o-~nt! LAn(r)·An(k+r) (3.62)

r=O

M-k-! rFn(k) = [(M -k}cr;nt1 LAFn (r)· AFn(k +r) (3.63)

r=O

where: k=O,l, ... ,M/2, crAn'crFn are Standard deviations for the signal amplitude and frequency, respective1y, and k is the time lag, which has a maximum value of M/2,

and: M-k-!

rAmn(k)= [(M -k}o-~mCT~nt! LAm(r)·An(k +r) (3.64) r=O

M-k-! rFmn (k) = [(M -k)o-imcr;nt! LAFm(r)· AFn (k +r) (3.65)

r=O

where: crAm. crAn and crFm. crFn are standard deviations between the nth and mth amplitudes and frequencies of signal harmonics, respectively [6].

These functions provide information on the relationships between signal amplitudes and frequencies and are very useful in determining the signal periodicity.

There are more parameters that may be derived using various approaches to the musical signal analysis, such as fractal dimension (based on fractal interpolation of the spectrum envelope) [155][185]. Fractal interpolation provides a new technique for generating sounds, thus defining it as a method of synthesis [155]. It produces functions in consecutive iterations that may be described on the basis of given points and a number reflecting the displacement of each line segment of the interpolating function. This method is schematically illustrated in Fig. 3.24. Suppose that the starting points in this method are (x~ y;) for i=O, l, ... ,N and the disp1acements are d; for i=l, ... ,N. In the case where the points (xJ. ... x;, ... xN) are

54 CHAPTER3

equally spaced and the original points and displacements do not lie in a straight line, the fractal dimension is given by the formula [ 15 5]:

/og(fld;l) D = 1+ 1-l

log(N) (3.66)

a. b.

c. Fd·~--~---.

n~ Fig. 3.24. Fractal interpolation of an exemplary spectrum envelope: originally

specified points (a), first iteration (b), second iteration, displacements equal 0.7 (c), eighth iteration (d)

As is seen from Fig. 3.24, the original data are given as four points (a), the frrst iteration represents linear interpolation through the four points (b), the ith.iteration is obtained as distorted copies of the previous iteration with the vertical scale, multiplied by 0.7 in this exarnp1e (c, d) [155].

Summarizing presented in this chapter information on musical instruments, it may be said that there is no consensus on the choice of parameters, or even of sound analysis methods, as each approach is not capable of offering the sufficient representation of parametrized sound.

Consequently, sound feature extraction is in principle a multi-dimensional process that should be optimized basing on some experimental procedures customized for each individual application field.

3.2. Musical Phrase Analysis

The artificial intelligence approach offers a new viewpoint from which it might be easier to investigate the subtleties of a musical experience. Using this technique, it might be possib1e to look at a wide range of topics in music research, from pitch perception to chord recognition, and, most interestingly, including the


recognition of musical styles and automatic composition. From the technical point of view, the analysis of such processes is to a certain degree similar to speech recognition problems. The main tasks of both approaches are concentrated on finding optimal representations of acoustical data rather than directly recognizing patterns. However, the music recognition system is much more complex. It consists of at least two closely related subsystems: acoustical recognition and music analysis. The tasks of the music recognition subsystem include the following: determination of the number of simultaneously sounding parts, along with their dynamic and timbre specifications; recognition of instruments; segmentation of signal into onsets and steady state segments; and derivation of pitch for all performed instruments. The music analysis system should include the recognition of time, tempo, tonality, note values and relative duration of notes, as weil as dynamics and other acoustical characteristics that are described by musical notation. The two coupled subsystems should at first provide an interpretation of acoustical events in terms of similarities between some fragments of music of the same style, and later use them as "signatures" to recognize a new piece of music in that style [194].

Optimal representation of data is a first step towards understanding their semantics, because it reveals physical causality in the data. In particular, in music analysis it is possible to focus on the MIDI (Musical Instrument Digital Interface) code-based representation of a musical piece.

Therefore, the primary objective of this study is to decompose a complex musical phrase structure into its components, including time duration, melody, etc., based on the MIDI code-platform. Additionally, one of the purposes of this work is to show the possibility for dealing with a musical phrase in a simplified way by means of parametrization. Consequently, feature vectors which contain selected parameters are created, analyzed and recognized using learning systems, based on the rough set method and neural networks [110][113][115][126].

3.2.1. Musicological Analysis

One of the most remarkable properties of the human auditory system is its ability to extract a pitch from complex tones. On the other hand, a person may also generalize what he/she is listening to. When confronted with a series of sounds, instead of hearing each sound as a unique event, he/she may choose to hear a me1ody pattem. The first approach may conclude in the recognition of an individual instrument, while the second may be thought of as the ability to recognize a musical piece, betonging to a given musical style. Musical style is a term denoting the mode of expression, or even more precisely, the manner in which a work of art is executed. It may be used to denote the musical characteristics of individual composers, periods, geographical areas, societies or even social functions. From the aesthetic point of view, a musical style concems surface of the appearance of music [202]. In musicology, there are different approaches to music analysis. Schenker's theory of tonal music defines a melodic structure as a diatonic line derived by analytical reduction when the upper

56 CHAPTER3

structure is removed. This fundamental melodic structure is called the Urlinie. Schenker extended this concept to fundamental composition and, finally, to the general concept of structural layers: background, middleground and foreground. The general concept of structurallevels provides for a hierarchical differentiation of musical components, which in turn establishes a basis for describing and interpreting relations among the elements of any composition. These considerations are founded on the concept that the removal of the upper-layers constitutes the core of the musical phrase, in some cases just a single note [202].

On the other hand, functional theory, described by Riemann, defines the relationships of chords to the tonic as center [202]. Riemann's main interest dealt with the classification of rhythmic motifs as on-stressed, interior-stressed and offstressed, depending whether their accent fell at the beginning, in the middle or at the end. His view point was that an increase in the frequency of interior- and offstressed rhythms brings an increase in energy. Additionally, he defined "rhythmic motif' as the smallest constructional unit of significant content and definite expressive value. Motif, being the fundamental unit, is at the same time an energy unit. When two individual units are a succession of notes and they are adjacent to each other, then they are combined into a larger form. Such a form then creates a higher formation, which is next in the hierarchy. Riemann's theory aims at searching for points which divide music into units. In Fig. 3.25, such a division is shown. Here, an eight-bar module is presented as 2/4 units. A two-bar module is a combination oftwo motifs which form a half-phrase [202].

two-bar group half-pbrase

WI..JI..JII..JI..JI..JI..JIIIJI..JI..JI..JII..JI..JI..JI..J

half-pbrase

whole pbrase

Fig. 3.25. Application ofRiemann's theory to music division.

Leach and Fitch proposed another approach to music analysis. The basic assumption of this method is that music is a an association of musical phrase repetitions. Fragments of music that are repeated are later combined into groups. In Fig. 3 .26, an example of such an analysis is shown. In the example, at the beginning a three note group is repeated three times, the tenth note is then different and therefore does not belong to a larger formation, but the next sequences of notes are also repetitions, hence they are grouped. The whole process forms a hierarchical tree-like structure [142].

Lately, another theory of music analysis has appeared which is based on the notion that the starting point for the analysis is the construction of a model that reflects the listener's perception of music. Such a model of structural hearing was first introduced by Lerdhal and Jackendoff, reexamined by Narmour, and again revisited by Widmer [213]. Structural hearing, according to Lerdhal and Jackendoff, is a concept that allows one to understand music as complex structures consisting ofvarious dimensions (see Fig. 3.27).


Fig. 3.26. An example ofthe music analysis proposed by Leach and Fitch

"EXPRESSION' fl/Jpljed ro a nora ( E { rubalo, dynsmics. (erll~:u/arton. vibralo, .•. ) } )

lmpottance

~fi roleln

constltuent strocfUre role in

•proces:~ •tructure• (lmpllcallvlty I ciDSure)

surface structural ~ ~-------~ sallence lmportence .r-- ------, rll inprocess 1 r{ 1 in phrue I 111 l)lpe ot process I III Ievel of phrue 1 I 111 re/.pos./n phrue 1 III rel.pos.in process I 1, 1 srarts phrue 1 111 durelion of process I 111 ends phres• 1 I 1 I dlrecrion ol processl 11, ____ ----_l 1111 stndarfsprocess : ,';_-_-_- --------.r 1 e sprocess

L.L-~=-=-=--- --=~-7

melri::~~~ rlme-! ~enria/ slrenglh 1/ lmpoffllnce rote

durarton

harmonic slabilily

MODJ;;L OF STRUCTUR.AL HEARING

Structurrll undrntanding ofpiece: A

time-span reduction: ~ ~

proceu structure: r Dill F llnMr i ~

musicalsurface: i ,. ' r 4 3 tJ lfi_J JJ r melrical structure:

poupinJ (pbraso) structuro:

"Raw"' fiOfQICd pi~u: 15' t r J

Fig. 3.27. Structural hearing [213]

r ..

57

58 CHAP1ER3

The main dimensions in their model are as follows: phrase stmcture, metrical stmcture, time-span reduction, and prolongational reduction [213]. As is seen from Fig. 3.27, the organi:zation of musical stmctures is hierarchical. In contrast, Narmour concludes that linear connections between musical stmctures do not reflect the real nature of music. Widmer affirms Narmour's point of view, but used a more simplistic approach in bis experiment [213].

Rhythm Recognition

Rhythm may be said to be the most fundamental component of music. The appearance of rhythm seems to be the first step in the evolution of a musical culture. In the days of ancient Greece and Rome, rhythm - tempo, measures and note duration - were defined by the kind of rhythmical recitation. Rhythmic, dynamic and harmonic notations differed greatly from modern musical notation. During the Renaissance, the tempo fixed at the beginning of a musical score was constant for the whole piece and denoted on the basis of "taktus" (Latin), the basic time signature, also referred to as "integer valor notarum" (Latin). Starting from that time, awareness of rhythm "grew up". In the Baroque period of music, rhythmic features started to be important, and the Classical period began with a new interest in rhythm. The modern period is marked by the strengthening of rhythmic features, exemplified in the compositions of Bartok and Stravinsky [202].

The problern of developing a technique for finding rhythmic patterns is first of all a problern of definitions. Most definitions of rhytlun, tempo and time are interdependent and are not explicitly explained. Special attention should be paid to the segmentation of rhythmic progressions with respect to timing accentuation [201]. However, formulating rules for distinguishing accentuation is, at the same time, one of the most difficult problems. Forthis purpose, the notion of a rhythmic syllable - understood as a sequence of time events with the accent on the last event- was introduced [201]. In this way, rhythmic syllables may be defined for a particular example in a musical piece. On the basis of this methodical approach, it is possible to elaborate a kind of rhythmic grammar that may be useful in rhythm perception modeling.

Music analysis is also the basis of systems that allow for the automatic composition of a musical piece in a given style. The system created by Cope uses motifs, called also signatures. It is based on patterns of pitch intervals or rhythmic ratios that occur in more than one example of a style [211]. In the literature, a study on style recognition may also be found. A system made by Westhead and Smaill reads data from standardMIDI code files and collects motifs into a style dictionary [211]. The data are split into two training sets, positive examples (musical pieces ofthe same style) and negative examples (pieces not in the style). The style dictionary contains two-style subsets (Fig. 3.28) [211]. The two subsets have some overlap because motifs exist that are present in both of them. When a new test piece (M), is presented to this system, it attempts to classify the piece as being most similar, in terms of a motif, to either positive or negative examples in


the dictionaryo The similarity estimate is made relative to the dictionary meano Since the dictionary has a different number of motifs of differing frequency in each style subset, it displays a bias towards one of the training setso Othet calculations which are based on Shannon's information theory are also madeo Entropy estimates are made both with and without the use of motifso The reduction in entropy that results from using motifs in the representation is then measured in order to suggest how weil the style has been characterizedo The set p(M)

represents motifs already existing and extracted from the dictionary when a new melody M is presented to the systemo The frequencies with which these motifs occur in each style subset can be used to classify M [211] 0

Dictionary

Neg tive Style

Sub et (S _)

P (M)

M

Figo 3 0280 The style recognition process [211]

A style S is defined as a set of melodies M; 0 A melody is a string of notes, each specified by a pitch and a durationo However, no account is taken of harmonieso The definitions of: p(M;), representing the set of all motifs present in the dictionary and in M, and p.(S) , representing the set of all motifs present in more than one melody in S are as follows [211]:

p(S) = lw: 3Mi,Mf E S;i *- j;w E C(Mi) 1\ w E C(M1)J (3067)

where C(Mi) is the set of all possible motifs of Mi ,

p(M) = {w: 3p(SJ E D, w E C(M) 1\ w E p(Si)} (3068)

where Dis a dictionaryo

The entropy of a concept (next event in a piece) H(X) is defined as:

60 CHAPTER3

H(X) = -~:>(x)log2 p(x) (3.69) .reX

where the concept represents a set with a probability function represented by the random variable X, suchthat p(x) = P(X = x). The entropy represents the

minimal expected number of bits required to specify an element of X. Hence, minimizing description length is achieved by minimizing the entropy [211).

As a consequence of these assumptions, both dictionary data and melody data are extracted during the classification of musical pieces phase. As was mentioned, the data represented in the dictionary is as follows: the mean probability of all motifs in the dictionary that a motif is drawn from - the positive style subset; the variance of these probabilities; the total number of positive style motifs; the total number of negative style motifs. The melody data, on the other band, is represented by: the length (the number of pitch intervals in the melody); the mean probability of all motifs in the melody data that a motif comes from - the positive style subset; the variance of these probabilities; the number of motifs in the melody that only match with motifs in the positive style subset; the number of matched negative motifs; the significance (the probability that the melody mean value was arrived at by chance). The results obtained by Westhead and Smaill show that comparisons of style which are based on examples taken from different composers are more successful than when based only on form specification (such as fugues, chorales, preludes, etc.), especially since the system has no representation of rhythms nor of the structure of the musical piece.

Nevertheless, it will be shown in subsequent chapters that the use of methods more sophisticated than statistical ones may result in the successful discernment of musical pattems.

As is seen from the above given musicological review, a musical fragment can be described by its form, rhythm, melodic contours, harmony, etc. These descriptors may then be used as attributes to be placed in a case-based musical memory, with values extracted from the chosen musical material. The system can detect similarities and discrepancies between musical events in order to provide a means of retrieving them. The MIDI-code representation may be used in such an analysis.

3.2.2. MIDI Representation

There are some general remarks tobe presented before proceeding with the task ofmusic analysis based on the MIDI-code approach. To make a thematic catalog, one must begin with a collection of compositions and with procedures for deciding what the theme of each piece of music is. Then, an algorithm that computes pitches designated by the MIDI notation, taking into account clefs, instrument transposition, key-signature, etc., should be constructed (130]. The computation can be carried out in such a way that harmonically equivalent notes designate the same pitch. In the case of monophonic music, various techniques might be devised


to make an analysis of fundamental frequencies. Information about timbres, the ranges of the musical instruments and the styles in which compositions have been written may prove necessary in order to partition the musical sound into its instrumental or vocal parts. However, it is not yet possible to obtain such a complete representation of a musical piece from the acoustical signal, alone. For this reason, the starting point in the recognition of a musical style is to build up an expert system using a musical performance database [110][118]. It is possible, and even suggested, to use a collection of data containing a large database of music encoded in MIDI.

In Fig. 3.29, an algorithm engineered by the author that aims at creating an expert system is presented.

Musical Performance Data Base

MIDI SCORES SERVER ftp ...... edu

Human verification

of the rule base

Expert System

for Musical Styles Recognltion

Fig. 3.29. Lay-out ofthe experimental system for the automatic recognition of musical styles (learning tasks)

62 CHAPTER3

Procedures marked with a dotted line block in Fig. 3.29 represent learning tasks not used in the recognition mode. In the training mode, human supervision related to the classification of score patterns with a particular musical style is necessary. The next step is the decoding ofthe MIDI code. Such a block provides the core for the feature extraction procedure. In this phase, all attributes available in MIDI code pattems are decoded. The next block in the algorithm denotes the extraction of musical parameters out of pitch and note durations decoded from the MIDI code. The quantization block is necessary to build up the rough set database. Quantized values of musical parameters will feed the rough set -based algorithm as the condition attributes. Several concepts may be derived regarding musical style. As a decision attribute, the musical style class number must be chosen. The results of creating rules using the rough set -based algorithm must be verified by the human supervisor during the learning phase.

MIDI code-based informationwill be used as data in experiments shown in further chapters.

3.3. Acquisition of Test Results

3.3.1. Objective Measurement Results

Sound Qua/ity in Rooms

Assessment of sound quality in rooms is directly related to the acoustic properties of the rooms, with primary emphasis on the natural acoustics of an interior space. The overview of problems connected to the evaluation of room acoustics can be presented schematically (Fig. 3.30). The figure presents the relationship between objective measurement methods and subjective evaluation of parameters characterizing a given acoustical object.

I ACOUSTICAL OBJECT I I

OBJECTIVE ~< --;-~) SUBJECTIVE

Fig. 3.30. Relationship between objective phenomena and auditory impressions


Acoustica/ Background

Many parameters exist which describe the features of an acoustic hall. Most of them are calculated based on the assumption that it is usually sufficient to consider the propagation of the sound energy and not the sound pressure, therefore all phase effects may be neglected. The basis of this assumption is that the dimensions of a hall should be large enough in comparison to the acoustical wavelengths. The so-called Schroeder frequency:

/ 8 =20oo.Jr /V [Hz] (3.70)

limits the frequency under which this assumption is not justified. Contrarily, above Schroeder's frequency, the resonances of the sound field are dense enough to be analyzed statistically. In this case, any sound signal will excite many modes simultaneously. Since their phases are nearly randomly distributed, phase effects are therefore cancelled by superimposing.

Acoustical parameters may be divided to four groups that are based on some common characteristics. The best known are parameters correlated to the reverberation time.

The reverberationtime is defined as time needed to decrease energy by 60 [dB] from its Originallevel after an instantaneous termination of the excitation signal. This parameter, originally introduced by Sabine, is given by equation (3.71):

(3.71)

where: V- hall volume [m3], A - total area of absorption.

As is seen from the equation, RT60 may be controlled by either a change of volume or a change in the absorption factor. Equation (3.71) assumes that the sound energy is equally diffused throughout the room (i.e. homogenaus and isotropic). Actually, this condition is rarely fulfilled because of large areas existing in a hall characterized by differentiated absorption. Therefore in practice, the decay of sound may be described by two reverberation times: the first is called EDT (Early Decay Time), computed within the range of (OdB, -5dB); and the other is called Early Reverberation Time, defined by the (-5dB, -35dB) range. It was discovered by Eyring that the classical formula given by Sabine is false when there is a big room absorption. He assumed that energy attenuation in this case is

(1 - a E) n per second, where a E = AIs bound and Sbound are bounding surfaces. lt is also very important to determine the dependency of the reverberationtime upon frequency. On the basis of the reverberation decay cmve, not only can the reverberation time be conveniently determined, but the curve may also reveal other acoustical features, such as the existence of flutter echoes.

64 CHAPTER3

The relevance of the mean free path length (L) is also obvious (138]. In the most commonly used formulas, it is separated from the absorption factor of the surfaces and is thus proportional to the reverberation time:

RT60 oc L · f(a) (3.72)

where: f ( a) - function of mean absorption a , L - is proportional to the ratio of

volume and total surface 4 VIS. When using this criterion, it is important to assume the energy flow per unit area and to describe spatial distribution by assuming that a sound particle has covered a distance a space without any collision.

Another parameter belonging to this group is Rise Time, defined as the time needed for the energy to increase within the range ( -5dB, OdB). In the literature, one may also find a different definition of this parameter). Within this group of objective parameters, some subjective measures were defined by Beranek [19](20]. These include Liveness, Warmth and Loudness of the direct and reverberant Sounds (L). Their subjective meanings will be presented further on in this next section.

Parameters of the second group are derived from the hall impulse response. The objective of such measures is to examine the energy distribution within the time Iimits !1 to !2 of the impulse response, counted from the time of arrival of the

direct sound !1 (Eq. 3.73): tl

E(t1,t2 ) = f h 2(t)dt (3.73)

t,

where: h(t) - impulse response of an auditory hall.

In Fig. 3.31, the impulse response is presented in simplified form. It consists of three time phases related to the direct sound, early sound reflections and reverberation (late reverberant sound components).

h(t)

direct early sound sound reflections

reverberation time (ms]

Fig. 3 .31. Exemplary impulse response of a hall


One of the most important parameters describes the time interval between the arrival of the direct sound from the source and the first reflections, expressed in [ms], called the lnitial-Time-Delay-Gap (ITDG). This parameter is also known in acousticalliterature as lntimacy [20].

Two proposed quantities- Definition -Cdef, introduced by Thiele [203], and

C50 - are applied to the evaluation ofword clarity. They areexpressedas follows:

50ms f p 2 (t)dt E

C = IOlog oms = lOlog-50- [dB] def co E ' f 2 Total

p (t)dt

(3.74)

Oms

50ms

f p 2 (t)dt E

C50 = lOlog 0:S = lOlog-50-, [dB]

f EREV p 2 (t)dt

(3.75)

50ms

where: E50 - energy in the frrst 50ms after the arrival of the direct sound,

ETatal- total energy,

EREv -late-arriving sound reflection energy (reverberation).

The higher value of these ratios, the clearer the sound. Similarly, another parameter- C/arity ( C80 ) was defined by Reichardt et al. [178] for the evaluation

ofmusic clarity, as shown in Eq. (3.76). Unlike word clarity, in this case energy of the first 80ms of the hall response ( E80 ) is taken into account. All of these parameters indicate whether a listener is able to separate different instruments, and, especially, to clearly hear the instrument attacks.

80ms

f p 2 (t)dt E

C80 = lOlog 0:S = lOlog-80-

J EREV p 2 (t)dt

(3.76)

80ms

The next two parameters are also defined on the basis of the impulse response, but they are expressed in seconds and not as an energy ratio in [dB]. Cremer

66 CHAPTER3

introduced the center time parameter (Tc ), defined as the gravity center of the

square oftheimpulse response:

<JJ

Jt·h 2(t)dt

T. _ ""'Oms=---C --oo

f h2 (t)dt 0

(3.77)

Parameters dealing with spatial energy distribution are included in the third group of acoustic criteria. It is due to the work of Marshall that early lateral reflections are now treated as highly desirable for obtaining a good spatial impression. This effect is perceived when the number and the level of lateral reflections are increased, the source seems to be larger and the music wins its importance. The subjective impression given by these reflections is related to some objective measures: Lateral Fraction, Spatial Impression and Interaural Crosscorrelation (L4CC), the last factor introduced by Damaske and Ando [55]. The first two criteria characterize the "effectiveness" of the early lateral reflection energy, and the third provides an evaluation of information relative to binaural hearing. Barron and Marshall studied the influence of delay, level, direction and the spectrum of the lateral reflections on the spatial impression [10][11]. They formulated some general remarks, for example: a delay of the lateral reflections larger than 80ms does not increase the spatial effects. Additionally, the degree of spatial impression is a linear function of the source level. lt was also discovered that the presence of an audience has an essential effect on the spatial impression. The presence of an audience considerably attenuates the energy level, therefore causing the diminution of the spatial effects. The spatial impression is also much more important in fmtissimos than in pianissimos. Moreover, spatial effects depend on the direction of the lateral reflections, increasing when this angle is near 90°. Barron and Marshall underlined the importance of the low frequencies in the sound spectrum in creating the spatial impression. The determinant frequency range for good quality spatial effects is between 60-lOOOHz.

The so-called Lateral Fraction describing ratio of early lateral to nonlateral energy is expressedas follows [20]:

where:

80ms

J E00 (t)dt

LF = -"'25=ms"__ __ 80ms

J E0 (t)dt Oms

(3.78)


E00 (t) - energy of lateral reflections captured by a bidirectional microphone

(time interval from 25ms to 80ms eliminates direct sound and frrst reflections from the ceiling above the source),

E0 (t) -total energy as measured by an omnidirectional microphone.

The Spatial Impression parameter - C 81 (also called the Spatial-Binaural

Criterion), introduced by Ando [4][5], is also taken into account when analyzing room quality:

80ms J p 2 (t)cosE>dt

C = 10 log-::::-0~ms=------SJ 80ms (3.79)

J p 2 (t)(l- cosE>)dt Oms

where: p(t) is the instantaneous pressure value produced by an impulsive

sound source at the listener location, and e is the angle between an incoming acoustic sound wave and an axis parallel to the listener's ears.

Adequate Spatial Impression evaluates the sufficiency of early lateral sound component energy. The corresponding quantity is measured using both omnidirectional and directional microphones.

JA CC is the parameter characterizing the interrelation between signals as perceived in binaurallistening. It is given by the expression:

80ms f PL(t)pR(t+-r)dt

IACC(-r) = [soO: soms ]112 oL Pißt o~~ (t)dt

(3.80)

where: PL and PR are sound pressures captured at the right and left ears, and r

is the delay between these signals.

"Perfect" interrelation between signals perceived by right and left ears suggests that these signals are identical, thus the spatial impression is not existent. The subjective impression related to this parameter is that as the difference between right and left signals is larger, the spatial impression and immersion into sound are better. lt has also been proven that IACC is significantly correlated to the dimensions of the hall, thus the value of JA CC increases when the hall becomes larger. In this case, the ratio of lateral reflections to total reflections decreases and

68 CHAPTER3

signals coming to both ears are much more similar. The value of /ACC depends highly on the direction of the reflections, their number and their amplitudes.

There are some other parameters defined for the purpose of the spatial impression description. Two of them are Spatialization Index (R), proposed by Reirchardt et al. [178], and Sound Force (G), introduced by Lehmann and [141]. The definitions ofthese are given in Eq. 3.81 and 3.82.

(3.81)

where: h0 (t), hc(t)- impulse response measured by omnidirectional and

cardioid microphones, respectively.

"' J h2 (t)dt

G = 10 log --'0"--00---

4nr2 J h~ (t)dt 0

(3.82)

where: h(t), hR (t)- impulse response measured in the hall and near the sound

source, respectively. A parameter called Diffusion was defined by Kutruff [139]. This criterion is

calculated on the basis of the autocorrelation function 'l'(t) of the impulse

response h(t). It is expressed as the ratio of the autocorrelation function value at r = 0 to its maximum value:

(3.83)

"' ~~1 when 'l'(r)= Jh(r)h(t+r)dt (3.84)

-<:()

The Diffusion criterion is small in the case where flutter echo appears in the hall (nonhomogenous distribution of sound), with the value of the autocorrelation function being at the same time large.

The last group of parameters is based on the intelligibility criteria. Parameters included in this group not only provide a measure for intelligibility in the hall, but


also take into account the influence of the hall on the quality of sound by defining the Ievel of the noise floor (S/N).

95ms J a(t)p2(t)dt

SI N = 10log-=-0=ms'-----

"' fp2dt

95ms

(3.85)

The factor a(t) depends on the Ievel of reflections, taking the value 1 for the initial reflections and diminishing to the value 0 for reflections that appear after 95ms.

There are two other significant criteria that should be included in the intelligibility parameter group, name1y Speech Transmission Index (STI) and Rapid Speech Transmission Index (RASTI) [79][196]. These criteria are not only used to quantify, but also to predict speech intelligibility in the hall. Tak.en into account are hall volume, reverberation time, noise floor 1eve1, sound source Ievel, and distance between the source and the listener's position. As is seen from the above description, these measures are highly comp1ex and characterize the physical features of a hall. The RAST! factor is defined in the range (0,1) and is corre1ated to subjective quality by five 1abels: bad, poor, sufficient, good, excellent. It is worth noting that the presented parameters take the effect of the noise floor in the hall into account.

Recently, Ando et al. proposed a procedure for acoustic hall design that applies the theory of subjective preference [7]. It describes four orthogonal factors that are of significance in subjective preference for simulated sound fields, namely: Listening Level (LL), Initial Time Delay Gap (ITDG), the subsequent Reverberation Time, and Interaural Crosscorrelation. These parameters were defined earlier, except of LL, which is described by the following dependence [7]:

LL = 10 log(1 + A 2 )- 20log d0 -ll[d.B] (3.86)

where: A is the total pressure amplitude of the early reflections and subsequent reverberation, d0 =Ir-r0l is the distance between the source and the listener's position.

The latter cited measures have the same subjective meaning as the ones that were previously introduced, however they are modified for the purpose of simulation calculations.

One may also find other definitions of sound field measurements in the literature, however, not all of them can be easily related to subjective impressions oflisteners and they therefore will be not reviewed here [29][44][85][140].

As is seen from the above discussion, despite the fact that only several parameters are presented here, the matter of testing acoustical quality can be very

70 CHAPTER3

complex. A database containing the results of such tests may be impossible to analyze and interpret without the use of algorithmic tools.

Relationship Between Objective and Subjective Attributes

Some of the descriptors or subjective attributes presented in this paragraph are fairly general in their applications and fall naturally under continuous scale quantification, such as bright to dark, warm to cold or poor to good. Such descriptors are used in both the evaluation of sound and room quality, especially as room acoustical properties cannot be assessed without taking into account the sound which is produced or reproduced in the given room.

Based on the research of acoustician L.L. Beranek [19], the quality of a room falls into five categories:

A + - excellent A - very good to excellent B+ - good to very good B - fair to good c+ -fair

91 - 100 points 81 - 90 points 71 - 80 points 61 -70 points 51 - 60 points

Rooms having an evaluation result of less than 50 points should not be considered concert halls, music theaters and opera halls. Their acoustics should be corrected. The author of this method divided the attributes related to acoustical properties of rooms into three groups [19]:

- independent, having a positive influence on the acoustical quality of the room (Intimacy, Liveness, Warmth, Loudness of the Direct Sound, Loudness of the Reverberant Sound, Diffusion, Balance and Blend, Ensemble),

- independent, having a negative influence on the acoustical quality of the room (Echo, Noise Distortion, Hall Non-Uniformity),

- dependent, consisting of attributes related to both of the above cited groups (Clarity, Brilliance, Attack, Texture, Dynamic Range).

The first eight properties can contribute to a maximum total value of 100 points, when rated numerically. On the other hand, the four negative attributes may subtract from the total rating. No points are assigned to the third group of acoustical attributes, as they are dependent on the attributes from the previous groups.

Intimacy - a hall is called intimate, provided music produced in it sounds as though played in a small interior. The subjective feeling of the room volume depends on the Initial-Time-Delay-Gap (ITDG - as defined previously). In socalled intimate halls, the delay of the frrst reflection is shorter than 20ms and the intensities of both the direct sound and the first reflection are comparable (Fig. 3.32a);


- Liveness - a room is "live" if the reverberation time for high and middle frequencies is long when it is filled with an audience. This parameter is also characterized by the VI Sc ratio, where V is the volume and Sc is the total surface of the room. The bigger this ratio is, the more live the hall is. The preferred reverberationtime for music in an operahall is shown in Fig. 3.32b;

- Warmth - a room is considered "warm" if the reverberation time at low frequencies ( T125 + T250 ) is Ionger than that of mid-frequencies ( T500_1000 ), see Fig. 3.32c;

( r, +T ) Warmth = 125 250

2 · T5oo-1 ooo (3.87)

- Loudness of the Direct Sound - defined here as the distance between the listeuer and the orchestra or the typically positioned opera singer, (Fig. 3.32d);

- Loudness of the Reverberant Sound - this attribute is a function of volume, reverberationtime and the energy distribution in the hall (Fig. 3.32e);

L = T500- 1000 ·1000000 V

where: V- volume is expressed in cubic feet.

(3.88)

- Diffusion - related to the subjective impression that sounds coming from different directions to the ears of a listeuer are distributed with the same loudness; this depends on the quantity of reflective surfaces in the hall (Fig. 3.32f);

- Balance and Blend - these attributes determine the internal harmony and balance between singer/performer and orchestra (Fig. 3.32g);

- Ensemble- this reflects performers' ability to hear each other, allowing unisono playing by an orchestra (Fig. 3.32h).

24 40 50 60 70

/ntimacy [ms]

a. lntimacy ( delay of the first reflection) (Legend to Fig. 3.32, see next page)

20~-------------,

R f 15 a mg 10

Points 5

0-f4'T-~,--,--.--..-,...~

0,7 1,1 1,5 1,9

Uveness [ms]

b. Liveness

72

20~-------------,

15~ Ra~ing 10 po1nts

c. Warmth

5 0 +--r---.--.---r----...----1

0,85 1,15 1,25

Warmth

0 2 3 5 6

Loudness [in feet]

e. Loudness of the Reverberant Sound

15~-----------,

Rating 10

points 5

0+--~--~-~

poor fair good

CHAPTER3

15~------------~

Rating 10 points 5

40 80 120 160

Loudness [in feet)

d. Loudness of the Direct Sound

5-.---------~

4 Rating 3 points 2

1 0+-~~...----~--~

insuff. poor fair

Diffusion

f. Diffusion (irregularity of surfaces and ceiling)

5-.-----------. 4

Rating 3 points 2

1 0+-~~--~-~

difficult interm. easy

Balance and Blend Ensemble

g. Balance and Blend (singer-orchestra balance) h. Ensemble

Fig. 3.32. Rating scales for various parameters of an operahall

Echo, Noise, Distortion, Hall Non-Uniformity - these attributes reflect the acoustical properties that contribute negatively to the total rating. They may be measured objectively. A problern with echo appears when the reflected sounds have significant intensity and their delay times are more than 50-SOms. In this case, the reflected sounds may mask the earlier sound. Background noise and


resulting distortions may also be heard, produced by the air-conditioning system or other sources inside or outside the concert hall. The last attribute describes the non-uniformity of the diffused field in the hall. These attributes may contribute as much as minus 50 points to the total rating [19].

As is seen in Fig. 3.32, rating scales differ for individual attributes. The range changes are from 40 points maximum for positive contribution of attributes to minus 50 for attributes that contribute negatively to the overall quality of rooms. In a similar way, rating points were also assigned to other types of big auditoria. Having a subjective opinion about a hall established by a group of acousticians, musicians and music critics with whom he worked, Beranek formed his opinion on the importance of attributes introduced to these analyses, with weighting factors being chosen arbitrarily. Moreover, no points were assigned to the five attributes that are dependent both on positive and on negative acoustical descriptors.

Another question that arises from Beranek's work is whether attributes are linearly additive. Some of the parameters presented previously are linear in their character, e.g. ITDG, Early RT and Ov. RT are described in terms of time, but others are nonlinear as they are derived from the energy of measured room impulse response and correlated logarithmically to human auditory sense characteristics. When looking at the Caet parameter, one can notice that it reflects the ratio between the early and latecoming energy of the impulse response of the given auditory hall. However, it should be remernbered that studies of Beranek belong now to classical works on architectural acoustics.

Numerous investigations have been undertaken to determine new criteria that should be considered when describing acoustic quality [81]. Various acoustic parameters have been suggested and multi-dimensional analyses performed to correlate objective parameters to the subjective perception of acoustic quality [7][106][120]. They were then weighted according to generally accepted judgements of their relative contributions to the total merit of a concert or opera hall. Hulbert et al. introduced the so-called merits in order to correlate obtained values to their assessments [81]. Such merits are normalized over the central value of each parameter. Correspondingly, for the Definition parameter, the related Definition merit M Def maybe expressedas [81]:

!M Def = CDef 10.45 for CDef ~ 0.45

MDef = 1 for CDef ~ 0.55

M Def = (1- CDef )/0.45 for CDef ~ 0.55

(3.89)

Hulbert's work is in general agreement with classical studies conducted by Beranek, with some of the acoustic parameters replaced by others having more acoustic merit in the opinion of Hulbert et al. [81].

Another problern that should be mentioned is that optimum acoustical requirements vary widely depending on the purpose of the hall. However, since

74 CHAPTER3

the main concem of this study is the relationship between measurable data and corresponding subjective impressions, this problern is not discussed here.

3.3.2. Subjective Test Results

Object Assessment

Subjective listening tests are a part of the assessment of sound production in acoustic environments [24][33)[60][69][71][75][92][212]. They are designed to reveal the presence of differences between objects being tested, or are intended to yield subjectively scaled ratings according to some chosen criteria. Overall, they consist of a test procedure, data acquisition and analysis, and interpretation of the results. The analysis and result interpretation parts are still only recommendations, not standards, in acoustical practice. The experimental procedure first aims to identify certain physical and psycho-physiological variables, then to isolate them and, fmally, to control them. The objective is to minimize the biases and variations in listeners' judgments that are attributable to factors other than those under test. In acoustical practice, standard statistical processing of subjective evaluation results is usually performed [1][25][26]. However, it will be shown that methods based on soft computing can also be valuable in this domain, and that they can even outperform traditional methods in some aspects.

3.3.3. Statistical Processing of Data

Processing of Results in Non-Parametric Testing

The most popular and easy to use non-parametric method for sound quality testing is the paired comparison test. The goal of this method is to compare objects, ordered in pairs, with each other and then to assess them on the basis of a two-level (better/worse) attribute scale. Technically, signal samples are presented in A-B or (1-2) order. The experts' task is to choose the better one from a pair of sound samples that differ in acoustic features. The result is that a certain number is assigned to each compared sound sample which reflects the experts' preference (number of cases when the object won i such comparisons). In order to obtain reliable results, this method requires an appropriate number of tests and a statistical analysis ofthe results [146].

The application of statistical analysis for dealing with empirical data is aimed first of all at revealing the significance of differences among the objects being tested. The basic idea is to verify statistical hypotheses. First, the null hypothesis, to be verified later, has to be formulated. Since it may be disproved, an alternative hypothesis is formulated at the same time. The decision to accept or reject the null


hypothesis is based on the value of statistics which have been derived analytically. In practice, critical limits for the decision making are chosen. If the value of statistics falls in a very unlikely part of the distribution, then the conclusion is that the null hypothesis is false. The construction of a test requires making an assumption as to the significance level. This value refers to the probability of taking into account a hypothesis that is in reality false. In practice, the significance level is very often set to a value within the range from 0. 0 1 to 0. 0 5.

The chi-square statistics z2 is defined as follows [146]:

( ni ·n1 ) 2

n .. ---r s lJ n

z2 ="" , /=(r-l){s-1) LJLJ n·n. j:J j=J I J

n

where: I - number of degrees of freedom, r - number of test series performed, s - number of objects under test,

(3.90)

nif - the number of events that are included in the ith series of a test and the jth object (number that denotes how many times the jth object has been chosen in the ith series of a test),

s

ni = Lnü - the number of events in the ith series of a test, J=i

r

n 1 = L nif - the number of events that occurred with the jth value, i=l

r s

ni = L Lnü - the total number of events. i=l j=l

If the null hypothesis is true, then the value for the chi-square statistics should be small, not exceeding the critical part ofthe distribution that is defined as:

P~2 ~x~)=a (3.91)

In the above defined approximation (3.91), a. is the significance level. If the value of the measured statistics exceeds the value for X~ (found in statistical

tables) at the assumed significance level a and degree of freedom, then the null hypothesis is no Ionger valid and might be disproved with the probability of taking the false decision into account equal to a in favor of the alternative hypothesis.

76 CHAPTER3

The chi-square statistics is a useful tool for characterizing the significance of the association of objects. However, this probability depends on the total number of subjects from which the experimental data were drawn.

The quantity of scores related to the ith andjth objects, respectively, during the experiment is equal to:

where:

'i , ri- quantities of scores for ith andjth objects,

N- maximumnumber ofpossible answers for one object,

N =m·(n-1)·2

where: m - number of experts in a group, n - number of objects to compare.

(3.92)

(3.93)

The probability, then, of a casual difference between answers calculated on the basis ofthe test results is defined as [146]:

jP;-Pii Zij=-r===============

(P; + Pi)·(2- P;-Pi) (3.94)

2·N

If zij < z( a), then there is no reason to reject the hypothesis, the latter assuming

a lack of significant difference between the objects under test. The probability z( a) is to be found in statistical tables.

It is also necessary to check whether a change in subject answers during two series of the test alters the hypothesis assuming the stability of subjects. If a subject voted for the first presented object in the first series of a test, and for the second presented object in the second series, then in fact he voted twice for the same object. In this case, a two-level ranking scale has been applied. If the numbers for the object chosen by experts are the same for two series of the test, then such a pair is denoted by "N'', in the contrary case a pair is denoted by "R". If re denotes a number of "R" symbolstaken from both series of an experiment for "n" pairs of objects, then the approximation:

re s; r(a,n) (3.95)

where:


r (a,n) - critical value of variable r read out from the tabulated rank test distribution,

a - assumed significance level,

refers to a situation where there is no reason to reject the hypothesis assuming the stability of experts. It may also be assumed that a subject always uses the same criteria about the object during the experiment. However, such a measure might become a criterion of the stability of experts only in the case when the auditory subjects' memory factor is not taken into account.

The results might be then presented in the form of diagrams denoting the dependence characteristics of the subjective preference scores for each object. The axes of such a diagram are defined as "Object" and the symbol "N', the latter indicating the normalized index of subjective preference scores.

Sets of answers from the experts provide direct test results which are then analyzed statistically. The whole procedure may be carried out according to the steps shown in Fig. 3.33.

( START .. Summing up thc rnmb er of votcs givcn by thc

partieular cxpcrbi for caeh ofthc objcets

.... Dctcrmining thc stability of cach cxpcrts'

ehoiccs (paramctcr z 1)

.... Dctcrmining thc sum ofvotcs givcn to an

objcct by thc succcssivc cxpcrts .. Dctcrmining thc nurober of votcs givcn to an objcet by sueecssivc cxpcrts to both parts of

thc tcst

.... Dctcrmining thc statistics x2 by co~aring thc

rcsults ofbothparts ofthc tcst

.... Dctcrmining thc nurober of cxpcrts who intcrprct a givcn pair in a different way depending on the part

of thc tcst (paramctcr Zz)

.... Examining thc significance of thc diffcrcnces

bctween the ob jects in a pair at thc assumed I evcl of significancc (paramctcr z3)

.... END

Fig. 3.33. Consecutive steps of a statistical analysis procedure applied to a paired comparison listening test

78 CHAPTER3

If a test is carried out for two groups of experts, the number of results is doubled and comparison between the two groups is done through detennining the

z 2 statistics.

Parametrie Procedure as a Tool for Subjective Testing

The parametric method is conceptually the simplest one that may be applicable to the assessment of acoustical properties. Subjects are asked to listen to prepared sound excerpts and to try to assign grades to a provided Iist of parameters. This Iist of parameters must be weil defined and should be made known to the subjects in advance. It should include a sufficient number of parameters so that the subjects can represent all essential features of the sound sample.

It is possible to cany out the subjective tests by using grades for sound quality evaluation. An absolute scale may be used, with possible grades ranging from "1 to 3", "1 to 5", "1 to 6", etc., depending on the experiment requirements [146]. The choice of scaling might be taken from the recommendations of the CCIR (Commite Consultatif International des Radiocommunication) and the IEC (International Electric Committee). From the experimental point ofview, there is no objection to the use of any of the above listed scales, however, the scale" "1 to 6" does not include a middle point of evaluation that permits the relation between the subject and the object being tested to be determined. Thus, the most recommended scale seems tobe "I to 5". Consequently, the following descriptors may be assigned to the objects under test which regards the Ievel of perceivable distortions in evaluated sound:

- imperceptible (grade 5), - perceptible and not annoying (grade 4), - slightly annoying (grade 3), - annoying (grade 2), - very am10ying (grade 1).

Notice that these grades may be applied to parameters having negative meanings, such as NOISE, DISTORTION, etc. For parameters having positive meanings, grades from 1 to 5 may also be assigned, but in this case 5 is understood as excellent, 4 as good, etc. Additionally, some weightings may be assigned to differentiate parameters that are more intuitive for experts and easier to define. Another weighting factor might be added in the case of differing backgrounds within the group of experts.

This kind of test may be organized in such a way that the hidden reference is presented together with tested objects. The aim of presenting a hidden reference as one of the objects under test is to control the reliability of the subjects. If a subject assigns a grade of less than 4 for the hidden reference in the "1 to 5" grade scale, being exactly the same signal as the known reference sample, then it is not recommendable to take his/her results into account.


Processing of Resu/ts in Parametrie Testing

Some statistical calculations, such as mean value, standard deviation and 95% confidence Iimits, for a normal distribution are often computed with regard to the discussed parametric testing. Moreover, a usual way of conducting a subjective

test is to use grade-like variables. In the case of two quantized data sets, the r statistic is often evaluated and observed. Let R; be the number of events in bin i for the first data set, and S; the number of events in the same bin i for the second set. The i statistic is defined as [ 172]:

(3.96)

In order to compare the performance of an acoustical Device Under Test (DUT) or an acoustical interior resulting from listening tests, one should analyze the subject reliability and the influence of the site in which the test is performed. At least several ways exist for checking the obtained test results. One of them is to calculate the crosscorre1ation between scores given by a pair of subjects (or sites). A high correlation implies similar ranking. In some cases, especially when a nonlinear mapping between scores occurs, the crosscorrelation of ranking lists may be calculated. For each subject, the items are sorted by their scores. A rank is attached to each item for each subject (or site). A high correlation between the ranking lists implies that they are simi1ar. Additionally, differences (average or mean square) between subjects (or sites) are calculated. The latter measure amplifies the influence of large differences and minimizes the influence of "scoring noise" (Iack of concentration, ambient noise, etc.). Mean (f.J) and

standard deviation ( S2 ) are used to calculate the t-test hypothesis,assuming that

subjects of different sites be1ong to the same population (as is seen in formula 3.97) [146]:

T= f.Jx-f.Jr nx·ny·(nx+ny-2) (3.97)

~(nx -1)·Si +(nx -1)·Si nx +ny

On the basis of another measure, histograms, it is checked whether different subjects (or sites) use different (non-linear) scales.

In the case of multidimensional subjective tests, when the subjects' task is to vote for several parameters at the same time, a technique called multidimensional scaling (MDS) is highly recommended [59][94][137]. It allows for the exploration of the perceptual bases of listener's decisions. This technique aims at a specific arrangement of tested objects in multidimensional Euclidean space, based on the fact that distances between objects correspond monotonically to the perceived degree of dissimilarity between them. The space derived from a listener's ratings is interpreted as reflecting the listener's actual perceptual space [137].

80 CHAPTER3

In this teclmique a so-called stress measure (S) is defined for a configuration of points x;, .... ,x" in !-dimensional space, with interpoint distances dii:

(3.98)

where: values of dif are those numbers which minimize S subject to the A

constraint that the dif have the same rank order as the oif, where oif are

experimental values of dissimilarity between of n objects. The constraints are that

dij :s; di'J' whenever oij < oi'j' .

The stress measure answers how weil the given configuration represents the data. Smaller stress means better fit. Zerostress means "perfect" fit [137].

Measurable Data Analysis

In the case of measurable data, in the first step of the statistical analysis, such elementary measures as mean and variance of distributions are calculated. Next, in order to check whether two selected parameters are dependent on one another, the degree of correlation is calculated for pairs of quantities (x;, Yi), i=l, .... ,n. The most widely used method is the linear correlation coefficient r (Pearson's), calculated according to the formula:

n

:L<xi -X)(yi -f) r=~~i=~I============== (3.99)

n n

:L<xi -.X)2L<Yi -f)2 i=l i=l

where:

(3.100)

1 n

Y=-LYi n i=l

(3.101)

In the case of binomial or two-dimensional Gaussian distributions, some additional statistical tests can be utilized. It is convenient to assume the null hypothesis, meaning that if two variables x and y have no association, then the

PREPROCESSWG OF ACOUSTICAL DATA 81

unknown correlation value coefficient is equal to 0. In order to verify whether the assumed null hypothesis H 0 is valid, another expression is calculated, namely

Student's test t with the number of degrees offreedom equal to n-2.

r~(n- 2) t=~==,-

~I-r2 (3.102)

Then, for the assumed Ievel of significance equal to 1-a (a =0.01 or a =0.05),

the value of Student's test t= t a , t= t a , such as PQtl > t a) = a in the case when

H 0 is valid, is read from the statistical tables for n-2 degrees of freedom.

Afterwards, if the value t0 calculated from Eq. 3.126 fulfills the inequality

lt 0 I > t a , then the null hypothesis is disproved, indicating that the unknown

correlation coefficient has a value near one ( or minus one) and is significant.

Contrarily, if lto I ~ ta, it is therefore unlikely that the unknown correlation

coefficient is significant. Additionally, one can compare whether a change in some variables significantly alters an existing correlation between two other variables. Forthis purpose, Fisher's z-transformation is used:

_1 1 (1+r) z-- n --2 1-r

(3.103)

It is worth noticing that such standard statistical analysis provide information on parameter similarity, therefore may be treated as redundancy measures.

In the next chapters the above given analysis will be applied to selected case studies.

3.4. Data Discretization

The automatic classification of data needs some preprocessing stages, such as parametrization and discretization. The first of these was already described in previous paragraphs. As was already pointed out, this procedure is aimed at reducing the amount of data associated with digital sound samples, and it results in feature vectors. Parameters obtained as a result of this process can directly feed the inputs of classification systems, such as artificial neural nets, even if they consist of real values. However, other classification algorithms exist (i.e. rough set-based classifiers) which consist oflearning and testing phases. During the first phase, a number of rules are produced, on the basis of which the testing phase is then performed. The generated rules are of the following form:

82 CHAPTER3

(param_l)=(value_l) and .. and (param_k)=(value_k) => (class_i) (3.104)

Since the produced rules contain parameter values, their number should thus be limited to a few values. Otherwise, the number of rules generated on the basis of real parameters will be very large and will contain very specific values. Forthis reason, the discretization process is needed. After the discretization process is finished, parameters no Ionger consist of real-values.

Generally speaking, the discretization process can be perforrned in two ways, using quantization and dusterization methods: - the parameter domain can be divided into subintervals and each parameter

value belonging to the same subinterval will take the same value ( quantization process);

- parameter values can be dustered together into a few groups, forming intervals, and each group of values will be considered as one value ( clusterization process). Various attempts have been made in soft computing practice to process real

vatue data with several methods classified as global and local, but they have not yet led to definite conclusions [36][167][190]. The discretization methods are considered global if they are applied to the entire set of parameter values. On the other hand, some methods are limited to one parameter domain. Several discretization schemes were reviewed by Chmielewski and Grzymala-Busse [36], among them: Equa/ Interval Width Method, Equa/ Frequency per Interval Method, and Minimal C/ass Entropy Method. They also proposed a method which uses a hierarchical duster analysis, called Cluster Analysis Method [36]. In their approach, they discussed both local and global approaches to discretization problems. This last method should be dassi:fied as global, thus producing partitions over the whole universe of attributes [190]. Also, some methods of real value attribute quantization were proposed by Lenarcik and Piasta [143]. More recently, hybrid procedures ( containing both rough set and Boolean reasoning of real value attribute quantization) were proposed by Skowron and Nguyen, explaining the nature of the quantization problems with respect to the computational complexity. Using this approach, further development seems promising when using proposed methods as evaluation tools for unseen object classification [190].

Recently, discretization methods based on fuzzy reasoning also appeared [22][78][96][131][192].

In the literature, one may find methods that substitute crisp discretization subintervals with fuzzy subintervals defmed by the attribute domains [192]. These fuzzy subintervals have overlapping boundaries which are characterized by decreasing membership functions. Following the proposal made by Slowinski et al., first some discretization methods such as minimal entropy per interval, median duster analysis and discrimination-based method were used in the experiments. Next, for each cut point c on the attribute domain, two consecutive subintervals are defined while the heuristic approach which minimizes the information entropy measure has been applied. The applied measure was used in order to check


whether consecutive crisp subintervals may be substituted with adequate fuzzy subintervals [192].

The class entropy of the subset S1, representing one of two consecutive subintervals, is defined as:

k

Ent(S1) = -LP(Ci,S1)log(P(Ci,S1)

i=l

(3.105)

where: Ci- jth decision dass, P(CJ>SJ - proportion of objects in S1 (1=1,2) that belong to dass Ci.

The dass information entropy of the partition induced by cut point c is expressed as follows:

. _ISII ISzl Ent(A,c,S ) -

181Ent(S1) +181Ent(S2 ) (3.106)

where: A - attribute domain, S- set of objects x;.

Then, the crisp discretization is substituted with fuzzy trapezoidal subintervals. The class information of the fuzzy partition is defined in an analogicalway with regard to the crisp partition [192].

Another approach to automatic acquisition using membership functions was introduced by Hong and Chen [78]. Their proposal consisted of two phases: searching for relevant attributes, and bui1ding membership functions. The search for relevant attributes is also executed on the basis ofthe entropy concept. Forthis purpose, the fitness degreej'; of attributeA; is calculated as follows:

q;p ~ ~ ;; = 1- _.!_ L L Difk P ·log P D;ik P

q, 1·=! k=l ""'D ""'D L.... i.Jk L.... ijk k=l k=l

(3.107)

Relevant attributes are selected according to a procedure in which a threshold of error tolerance is assumed. lf (1-/;) of a chosen attribute is less than the assumed error, then this attribute is selected as relevant. ln the next step of computations, the range of each attribute A; is calculated and then divided into subintervals. An initial membership function is built for each such subinterval, defined by calculating the average value of instances falling within this interval [78].

Many algorithms employing neural networks have been also developed for the purpose of organizing pattems into dusters [95][179]. These algorithms aim at discovering precise dusters of overlapping input. Each input vector is assigned to

84 CHAPTER3

one of the clusters, assuming weil defined boundaries between the clusters [961. However, hybrid methods that use both the FCM (Fuzzy Cognitive Maps) method and other learning schemes are more successful in the recognition domain. FCM are dynamic systems which relate fuzzy sets and rules. The simplest FCM acts as asynunetrical threshold networks or continuous neurons and converge to Iimit cycles [96].

Two methods based on fuzzy reasoning were also introduced by the author, namely: the Fuzzy Quantization Method (FQM), and the Fuzzy Perceptual Quanti:zation Method (FPQM) [53][131]. The former method uses fuzzy logicbased division of the parameter domain, and the latter is based on subjective testing results.

A collection of the above-cited discretization methods will be presented in the following section.

3.4.1. Quantization Algorithms

The quanti:zation process can be performed using various algorithmic approaches. The division of the parameter domain into subintervals is defined as follows:

Let A be a real value parameter and Iet the interval [ a, b 1 be its domain. The

division TL on [a, b 1 is defined as the set of k subintervals:

(3.108)

where: a0 = a,ai-l < ai, i = l, ... ,k, ak = b

This approach to quanti:zation is based on calculating division points aj. After quanti:zation, the parameter value is transformed into the number of the subinterval to which this value belongs.

The simplest division is binary quantization, where I TL I =2. Unary

quanti:zation, where I fl I =1, is excluded because it results in total information

loss. A method called adaptive discretization may be performed using a binary scheme. In this method, the parameter domain is frrst partitioned into two, equal width subintervals. Then, a learning system is run to induce TQles which are subsequently tested. lf the performance measure falls below a threshold value that is optimum in the sense of computational power, one of obtained subintervals is further divided. This process is repeated until the final performance Ievel is reached [190].

One of the proposals is a method of global discretization ( quantization of all parameters at a time) which is also based on the binary approach [143]. By a set of intermediate values for Rm, different from attribute values, an ordered family


A -{A(I) A(m)} f t A(q) -{ (q) q } -1 . t F - , ... , o ses - a1 < ... <an(q) ,q- , ... ,m 1s mean. or every

a~q) , the binacy attribute is defined as:

(3.109)

Thus, X A = {xu, ... ,x1,n(l)•····xm,l•····xm,n(m)} is a set of binacy attributes corresponding to intermediate values fromA.

Another method of parameter domain division is the Equal Interval Width Method (EIWM), where the parameter domain is partitioned into equal width intervals [36]. More sophisticated methods are based on the calculation of entropy. One of them is based on maximum marginal entropy being used as a criterion of division. This process involves partitioning the domain in such a way that the sample frequency in each interval is approximately equal, and is called the Equal Frequency per Interval Method [36]. The number of intervals is provided by the experimenter. In the Minimal Class Entropy Method, a list of "best" breakpoints is evaluated. The class information entropy of the partition induced by a break point q is defined as:

E(A q· U) = I SI I Ent(S ) + I s2 I Ent(S ) ' ' IUI I IUI 2

(3.110)

where: S1 , S2 - results of the division of U, U - the set of all examples of the data set.

The point at which E(A,q;U) is at its minimum is chosen as a division point.

This determines the binacy discretization for attribute A. In order to obtain k intervals, the procedure described above is applied recursively k-1 times. Having computed the division U into U1 and U2, further discretization is performed after calculating E(A,%;U1) and E(A,q2 ;U2 ), where q;- the best point of division for

U;, i=l,2. If:

(3.111)

then U1 is partitioned, otherwise- U2 ("worse" of sets U1 and U2) [36]. Skowron and Nguyen proposed a method for the division of all parameter

domains based on the Boolean reasoning approach [190]. The Boolean function p(a,k) is related to every parameter value, where k is the number of the parameter with value v, and a is the parameter number/name (v(l)<v(2)< .... v(k)). At the beginning, every parameter value is considered as a division point. Function p(a,k)=true ifthere is a division point p, suchthat v(k) s; p < v(k + 1), andfalse in

the contracy case. For every pair of objects betonging to different classes, function

86 CHAPTER3

p values are calculated. These values are placed in a so-called decision table as rows, where the columns are associated with division points. Next, the columns with the largest number of 'true' values are chosen from the table. After every choice, the selected column is removed from the table. This process ends when the decision table has emptied. Finally, the division points which were chosen from the table while executing the above procedure are moved from v(k) to (v(k)+v(k+ 1))12.

Further methods of parameter domain division are based on statistical approach. Two of them that are based on calculating the Behrens-Fisher statistics V for every parameter, for two classes X and Y were introduced in the Sound Engineering Department of the Technical University of Gdansk [ 48]:

V= ---r==X=-=y== ~sNn+SUm

where:

X , Y are mean parameter values:

- 1 n X=-·'LX;,

n i=l

s{ , Si are variance estimators of respective random variables :

2 1 ~ -2 2 1 ~ -2 S1 =-·L}X; -X) , S2 =-·L...(Y; -Y)

n -1 i=l m -1 i=l

and: n, m are cardinality of populations X and Y accordingly.

(3.112)

(3.113)

(3.114)

Provided these estimated distributions are having the same dispersions, the discriminator value may be calculated on the basis of the following equation:

(3.115)

This value serves as the discriminator between parameter domain subintervals. The described situation is illustrated in Fig. 3.34.

For the case of unequal dispersions the discriminator should be closer to the mean value of the distribution that is having lower dispersion, thus the following term is to be fulfilled:

(3.116)


where: P(x > d xy) - probability that the random variable x fulfils the term: x > d xy,

P(y < d xy) - probability that the random variable y fulfils the term: y < d xy .

Fig. 3.34. Probability density plots oftwo Gaussian distributions with the same dispersions and different mean values.

Assuming that the a priori probability of random events x and y are equal each to other, the above term (3.116) guarantees the lowest probability of making the wrong decision. This situation for the case of unequal dispersion is illustrated in Fig. 3.35. The need to fulfil the above term (3.116) demands the estimation ofthe value dxy :

(3.117)

Fig. 3.35. Probability density plots oftwo Gaussian distributions with different dispersions and different mean values.

For the data basis containing k classes, the number of the possible pairstobe compared is equal to:

k. (k -1) p=

2 (3.118)

88 CHAPTER3

Subsequently, the values calculated for the above pairs are used for the quantization of feature vector parameters. The values are described by the corresponding statistics, providing a measure of significance for such comparisons.

The number of ranges resulting from the quantization procedure may be limited arbitrarily, due to the need to observe the computational costs. The need to reduce the number of ranges occurs when the number of generated pairs is exceeding the assumed quantization order. This reduction may be realized on the basis of one of the following procedures:

1. Constant quantization - imposing the same number of subintervals for each parameter domain. The division values are selected from the calculated discriminates, using those for which the absolute values of the calculated statistics IM give the highest results.

2. Variable quantization - practically leading to different quantization of each feature vector parameter. The division values are selected for the absolute value of the statistics I fl1 exceeding a selected threshold.

There are also other approaches to be found in Iiterature which are of interest. Some of them are currently under development, so it is still difficult to evaluate their effectiveness. The above given review of discretization methods was limited to methods that were tested within this study [120][128][129]. Two further methods which have been introduced by the author, such as a fuzzy logic-based division of parameter domain (FQM) or a so-called Fuzzy Perceptual Quantization Method (FPQM), will be described in the next chapters.

3.4.2. Clusterization Algorithms

Another way of discretization of parameter domains is clusterization. In this case, parameter values are joined into intervals and then, like in the previous methods, the real values are transformed into the number of subintervals which these values belong to.

In the Cluster Analysis Method, proposed by Chmielewski and GrzymalaBusse, a hierarchical duster analysis is used [36]. Clusterization is performed using dass entropy measure as a criterion of value agglomeration.

Let: m = lVI, where U - the set of all examples of the data set, {A1, ... ,4,4+1, ... ,A,.} - the set of all attributes (parameters),

where: A1> ... ,A; - continuous attributes, i.e. real-value ones, and

Ai+ I , ... ,An - discrete attributes.

Each element e e U can be divided into a continuous component:

econtinuous = (x:, ... , X;e) , and a discrete component of e: ediscrete = (x:+l• ... , x~).

Since continuous attribute values may not be of the same scale (i.e. feet, pounds, meters, kilograms, etc.), they are normalized to zero mean and unit variance in order for dustering to be successful.


Clustering starts with computing an m x m distance matrix between every pair of continuous components \:1 e e U . The entries in this matrix correspond to squared Euclidean distances between data points (parameter vectors) in idimensional space. At first, m clusters are introduced (all i-dimensional) since each i-dimensional data point is allowed tobe a duster of cardinality one. New dusters are introduced by agglomerating two which existed before, choosing those for which the distance between them is smallest. Clusters b and c form a new duster bc, and the distance from bc to another duster a is computed as follows:

Og = a ·E0 (0)+b ·D5(0)+c·Min(O)+d ·Max(O)+e·l (3.119)

where, for example, ab=ac=0.5, b=-0.25 and g=O for the Median Cluster Analysis Method [36].

At any stage of the dustering procedure, the already-obtained clusters induce a partition of U because objects belonging to the same cluster are indiscernible by the subset of continuous attributes. Therefore, the criterion for the finishing of the

cluster formation can be as follows: L~ < Lc, where: Lc - the original data level

of consistency, L~ - the discretized data Ievel of consistency. The next stage of this method is the joining of intervals for every single

attribute, i.e. in I-dimensional space. Let r denote the number of clusters obtained and K- cluster. For the attributeA1 and the cluster K, the obtained interval is:

/K A = [LK A ,RK A.) = [min(x1e),max(x1~)] •J •J ' 1 K K (3.120)

Fora givenA1, the cluster K domain can turn outtobe a subdomain of another cluster K', i.e.:

LK A- >LK A andRK A >RK A· ' 1 ' j ' J ' 1

(3.121)

In this case, subinterval IK.Ai can be eliminated. After eliminating subintervals, sets of left and right boundary points are constructed, Li and Ri respectively. Hence, the partition 7r i for the attribute Ai is equal to:

tri = {[min1 (Li ),min2 (Li )),[min2 (Li),min3 (Li )), ... ,[min,(Li),max(Ri)]}

(3.122)

where: min,.(Lj) - the nth smallest element of L1.

The fmal stage ofthis method is the joining of existing intervals. Let:

(3.123)

90 CHAPTER3

If dass entropy is equal to zero, the two neighboring intetvals [a1_}, a1) and [a1,

al+l) can be fused into hi,J+J without diminishing the consistency of the set. The zero-valued entropy means that hi,J+J describes only one concept, in part or in full. Merging can be continued, but this involves resolving two questions, namely: which attribute intetvals to compine :first, and which adjacent intetvals to combine first.

In order to determine the merging priorities, the entropy dass function is applied. This function is calculated for each pair of intetvals for each continuous attribute. The pair with the smallest entropy is chosen. Before merging is performed, the accuracy of the new data set is checked. If the accuracy falls below a given threshold, then this pair is marked as non-mergeable. The process stops when each single pair of neighboring intetvals is marked as non-mergeable.

A simple method of dusterization, based on statistical approach, and called STATCLUST was introduced by the author [120]. In this method, parameter values are gathered together and form intetvals on the basis of the following algorithm [120].

First, the subsequent statistical parameters are calculated: - mean value of the intetval between two neighboring values of an attribute (n -

total nurober of attribute values):

n-1

E0 (0) = ~ L(Pi+1- Pi) n- i=1

- variance ofthe intetval between two neighboring values: n-1

"'- 2 L...J[O(pi+1 -Pi)] DJ (0) = ....:;i==l ____ _

n-1

- minimum distance between neighboring values:

Min(O) = min(pi+1 -Pi) i

- maximum distance between neighboring values:

Max(O) = max(pi+1 -Pi) i

(3.124)

(3.125)

Such a choice of statistical parameters allows a flexible approach to the aggregation of attribute values into ranges. This operation is possible due to the introduction of five selectable parameters, a,b,c,d,e E R, defined by the experimenter. These variables allow the option of assigning the threshold distance ( 0 g) between consecutive attribute values. Values that differ less than what is


detennined by the threshold will be aggregated in the same range. The value 0 g

is calculated from Eq. (3 .126):

Og = a ·E0 (0) +b ·D5(0) +c·Min(O) +d ·Max(O)+e·l (3.126)

Notice that any component of Eq. (3 .126) may be eliminated by substituting a value of 0 for the variables from a to e. The most characteristic programming variable settings are as follows: - a=l, b=O, c=O, d=O, e=O, then Og =E0(0) . Here, only 50% ofvalues are

aggregated to ranges, and about 50% of attribute values still remain isolated (assuming a Gaussian distribution of intervals between attribute values).

- a=l, b<>O, c=O, d=O, e=O, then Og = E0 (0) +b ·D5(0). In this case, the

value of b influences the percentage of non aggregated attribute values. Therefore, if b equals 1, then about 32% of the values remain non-aggregated (assuming a Gaussian distribution of intervals between attribute values).

- a=O, b=O, c=0.9, d=O, e=O, then Og < Min(O) . When this is the case, all

values are aggregated into one range. - a=O, b=O, c=O, d=l .l, e=O, then Og >Max(O)+e·l. This case is related to

the situation where all values remain separated. - a=O, b=O, c=O, d=O, e<>O, then Og = e . In this case, the value of the

threshold distance equals a chosenreal number. The mechanism for aggregating attribute values into ranges is illustrated in Fig.

3.36. Points Pk>Pk+1, ..... ,Pk+6 represent a segment of sorted attribute values. The

calculated threshold distance Og value is also shown in Fig. 3.36. Points

Pk+3 •Pk+4 ,Pk+s are already aggregated into one range. All other points in the

upper part of Fig. 3.36 remain isolated. In the next iteration, the existing threshold

distance may be extended to the point Pk+Z ,· and consequently points Pk and

Pk+l may form a new range.

0 ':'':'fiil*l

Fig. 3.36. Mechanism for aggregating attribute values into ranges

92 CHAPTER3

The STATCLUST algorithm is shown in Fig. 3.37.

Calcula1ing E0 (0), D~(O), Aifi·n(Ol MQX(O)

Fig. 3.37. Algorithm ofthe STATCLUST method

After the clusterization process is finished, some parts of the parameter domain may remain unassigned to any interval. In this case, some new objects may not be classified during recognition, but, on the other band, the experimenter can notice that an object representing a new class has appeared.

Another method of clusterization, called Maximum Gap Clusterization Method (MGCM), was engineered and applied at the Sound Eng. Dept., TU Gdansk [130]. The aim of this method is to divide a parameter value domain according to the size and location of gaps between objects in the domain. The algorithm of the MGCM method is shown in Fig. 3.38. The discriminators are placed inside Q largest gaps between objects in the parameter value domain but each duster (space between discriminators) must contain at least R objects. Values Q and R are assigned by user.


Fig. 3.38. Algorithmofthe MGCM method

3.4.3. Practicallmplementation

For the purpose of experiments selected discretization methods were implemented by the author under the MATHEMATICA system, namely EIWM, STACLUST, and MGCM methods. Additionally, already mentioned methods based on fuzzy reasoning were implemented in the MATCAD system. Below, an exemplary MATHEMATICA script including the main modules of the EIWM quantization method is shown:

(* EQUAL INTER V AL WIDTH METHOD*)

«Statistics' Descripti veStatistics'

(* Read.List Module *) (* Database Division into Instrument Class *)

94

Databaselnstrument[list _, ClassNumber _, ParNumber J := Module[{i,j, k, 1, listl, tab}, listl = list; tab=Table[{}, {ClassNumber}, {ParNumber} ];

For[i=l, i<=ClassNumber, i++, j=First[listl ];

]; tab

listl =Rest[listl ]; For[k=l, k<=j*ParNumber, k++, l=Mod[k, ParNumber]; Iql==O, l=ParNumber]; tab[[i,l]]=Append[tab[[i,l]], First[listl ]]; listl = Rest[listl] ]

(*Parameter Sorting Module*) ParSorting[list_, NoPar J :=Module[ { i, n, tabsort},

n=First[Dimensions[list]]; tabp={};

For[i=l, i<=n, i++, tabsort =Append[tabsort, list[[i, NoPar]]] ];

tabsort =Flatten[tabp ]; tabsort =Union[tabp]; tabsort

(* Database Sorting Module *) DatabaseSorting[list J :=Module[ { i, p, q, datasort},

q= Dimensions[list ]; p=q[[2]]; datasort =Table[{}, {p} ]; For[i=l, i<=p, i++, datasort [[i]]=Append[ datasort [[i]], ParSorting[list, i]]; datasort [[i]]=Flatten[ datasort [[i]]] ]; datasort

(*Parameter Value Domain Division *) ParDi visionPoints[SortList _, Nolntervalj := Module[ {i, n, x, 1, p, t, left, right, r, limit},

x=SortList; n=Length[x]; limit={}; left={}; right={}; l=x[[l ]]; p=x[[n]];

CHAPTER3

PREPROCESSIN'G OF ACOUSTICAL DATA

1eft=Append[1eft, 1]; t=p-1; For[i=1, i<=Nolnterval-1, i++, 1eft=Append[1eft, 1+i(t/Nolnterval)]; right=Append[ right, 1+i(t/N olnterval)] ];

right=Append[right, p]; For[i=l, i<=Nolnterval, i++, r={}; r=Append[r, 1eft[[i]]]; r=Append[r, right[[i]]]; limit =Append[limit, r] ];

limit ]

(* Database Quantization *) DatabaseDi visionPoints[SortDatabase _, No Interval _] := Module[ { i, n, c, full, databaseintervals},

n=Length[SortDatabase]; c= SortDatabase;

databaseintervals= {}; For[i=l, i<=n, i++, full = ParDivisionPoints[c[[i]], Nolnterval]; databaseintervals= Append[ databaseintervals, full]

]; databaseintervals

]

95

In literature, it is possible to find many other discretization methods in which metric values are used as criteria of parameter domain clusterization. These belong to the so-called analysis of clusters, but as this domain of interest is a large one, so the above given review was limited to some exemplary methods that were used in practical experiments carried-out by the author.

4. AUTOMATIC CLASSIFICA TION OF MUSICAL INSTRUMENT SOUNDS

4.1. Uncertainty of Musical Instrument Sound Representation

Generally speaking, no real instrument produces an accurate repetition of the same basic pattern. It tums out that these changes in microscopic scale are significant in indicating the nature of an instrument [58]. Both the frequency- and time-domain representations of the typical steady state of an instrument are difficult to obtain, and are moreover not as informative as one might expect. Thus, creating a data-base that consists of musical sounds requires a great deal of concern, remembering that both the way of playing and recording method may cause differences in a analyzed sound.

One of the main analysis difficulties is concerned with the correct detection of the beginning of the attack phase. Theoretically, the attack phase may be defined as the phase between silence and the sound steady-state. In practice, it is very rare that some kind of noise is not present at the beginning of the sound recording. So, the main task is to detect whether the attack transient has already begun or whether it is still only a background noise. This can be done on the basis of assumed threshold values, as defined in Section 3 .1.

Additionally, not all instruments may be represented by an ADSR model. For example, the sounds of some string instruments do not have a steady-state phase. Moreover, the starting transient duration differs for sounds from the same instrument and, obviously, for sounds from various instruments. The shortest duration for attack transients ranges from 15 to 35ms and is characteristic of the staccato way ofplaying on wind (double-reed) instruments. On the other band, the Iongest duration of this phase is obtained for such instruments as flute (wind group) or contrahass (string group) while playing legato.

Another problern that causes uncertainties in musical signal analysis is related to the accuracy of the pitch tracking algorithm [6][30][34][45]. There are many techniques in the signal processing domain which have been developed for this purpose, but, again, theoretical assumptions do very poorly in cases where the recording contains musical articulation features or features related to differentiated performance technique.

98 CHAPTER4

Problems related to the choice of the analysis method are also of importance. The Fourier transform, and more precisely the Short-Time Fourier Transform (STFT), is performed by applying a time window (which restricts the time domain to a specified inteiVal). The choice ofthe time window may significantly influence the spectral analysis. The length and the shape of the window is a measure of the bandwidth that can be discriminated by the analysis. The Ionger the length of the window, the better pitch detection. Contrarily, the detection of transient events is much worsened in this case. In the literature, one can find references regarding the compromises between the lengths and shapes of the time window. Most often, a 1024-point FFT with a Hamming window is used in sound analyses [58], [111][116].

Since parameters derived from the Fourier transform are better adapted for characterizing harmonic structure in a signal, this transform was selected for use in further analyses. In order to extract the attributes of the feature vector, sounds belanging to different musical groups were analyzed. Examples of such analyses were studied in order to identify a characteristic description that would remain typical throughout the musical scale, and would therefore be useful for the identification of an instrument group.

For the purpose of such analyses programs written by the author under the MATHEMATICA system on the Unixworkstation platform were engineered. In Fig. 4.1. and Fig. 4.2. examples of such MATHEMATICA scripts, that realize procedures of cepstrum coefficients (MCC) and Brightness (B) calculations for a bassoon, are shown.

(* MCC Calculations *) ReadList["/Music/Baza/bassoonlbsn_ a#1.spect", Nurnber, RecordLists -> True ]; x=Map[#[[1]]&, %]; n=Length[x]; Table[x[[i]](Cos[3·.14k{i-0.5)/n]), {i, 1, n}, {k, 1, 8} ]; t=Apply[Plus, %/n] ListPlot[t, PlotJoined->True]

Fig. 4.1. MATHEMATICA script realizing calculations of cepstrum coefficients (MCC).

(* Brightness Module *) Brightness[listAmplj:=Module[ {i, n, nurnerat, denurnerat, spectrum, B}, n=Length[listAmpl]; spectrum =listAmpl; denurnerat =Apply[Plus, spectrum]; nurnerat=O; For[i=l, i<=n, i++, nurnerat = nurnerat +(i spectrum [[i]]) ]; B= nurnerat I denurnerat ]

AUTOMATie CLASSIFICATION OF MUSICAL INSTRUMENT SOUNDS 99

(* Brightness Calculations *) ReadList[" /Music/Baza/bassoonlbsn _ a#l.spect", Nwnber, RecordLists -> True ]; x=Map[#[[l]]&, %]; AntilogAmplit= 1 O"(x/20 );

bsn _ a# 1 = Brightness[AntilogAmplit]

Fig. 4.2. MATHEMATICA script realizing calculations of Brightness (B).

In Fig. 4.3-4.6, results of such analyses are shown. However, these exemplaxy analyses demonstrate obvious problems related to the extraction of musical instrument parameters and, in consequence, to the classification of a group to which the instrument belongs. In Fig. 4.3a and Fig. 4.3b, spectra of two violin notes are presented, namely the notes g4 and a4 (musical notation according to the Acoustical Society of America standards). The visible difference in spectra is due to the interference of harmonics and fonnants while recording. Scanning the entire violin scale reveals other spectra that are similar to both examples. So, this is a case where the spectra of sounds from the same instrument may be totally different. Therefore, a frequency-domain representation does not provide a sufficient representation on which the recognition of an instrument may be based.

a. b. Magnitude

0.1

0. 6 .

Harmonics_No.

Fig. 4.3. Spectra: violing4 note (a), violin a4 (b)

Another example of difficulties related to the analysis interpretation is that two sounds, one of a violin and the other of a viola (Fig. 4.4a, 4.4b), may be qualified as betonging to the same instrument group on the basis of their spectra, but they are not discernible from one another on only this basis. The same situation was confinned for another class of instruments, the brass group. The spectra of a trumpet and a trombone, shown in Fig. 4.5a and Fig. 4.5b, are vezy similar to one another. On the other hand, a trombone and a trumpet note of the same pitch may sound differently (Fig. 4.6a and 4.6a). The spectra of the brass class of instruments, being vezy characteristic, may provide significant decision attributes, especially when parameters are derived on the basis of statistical processing. There are other interesting properties of brass instruments, namely: when the tone gets louder, the spectrum gets richer in high-order harmonic components; and,

100 CHAPTER4

notes of lower pitch have richer spectra than those of higher pitch. The latter property is shown in Fig. 4.6b.

a. b.

Harmonics_No.

Fig. 4.4. Spectra: violin b3 note (a), viola b3 (b)

a. b. M~gnitude

0. 9 0. 8

0. &

Harmonics_No

Fig. 4.5. Spectra: trombone b3 (a), trumpet d#4 (b)

a. Magnitude

I

0 9

trumpet b3

Harmonics_No.

b. M tude

0. 8 ~

0.4 ;

0. 2 ;

:.

Fig. 4.6. Spectra: trumpet b3 (a), trumpet d#6 (b)

trumpet dll4

Harmonics_No

Hormorics_No.

When looking at spectral parameters, it can be seen that they are not stable through the chromatic ranges characteristic for chosen instruments. Therefore, another parameter which is commonly used in the field of speech processing was

AUTOMATIC CLASSIFICATION OF MUSICAL INSTRUMENT SOUNDS 101

tested in experiments, namely: the cepstrum coefficient, calculated in the mel scale (MCC) [111].

On the basis of Eq. (3.58), mel-cepstrum coefficients (MCC) were calculated for the steady-state sounds ofvarious musical instruments. Forthis presentation, a trombone (brass) and a violin (string) were chosen. In Fig. 4.7(a, b, c), the first three coefficients for a trombone (shown for the who1e chromatic scale) are shown. In the next Fig. 4.8(a, b, c}, the first three MCC coefficients for a vio1in are presented.

a. b. c.

11~ 10

15 IS

10 10

t---,1~0 ---:20:--~10-Jou_lluücr

-I -l -s

Fig. 4.7. First threeMCC coefficients for a trombone

a. b. c.

IICC_hlut

15 15

19 10

10 20 10 40 Molt_R~Uo:

-5 ·5

Fig. 4.8. First three MCC coefficients for a violin

On the basis of these analyses, we can see that the investigated parameter may be more useful than any of those which were previously shown. Moreover, in statistical experiments where MCC coefficients were tested, it was shown that these parameters are easier to be distinguished from one another and classified into separate groups [ 111].

In Fig. 4.9(a-h), spectral centroids are presented for exemplary instruments within their frequency ranges. When examining Fig. 4.9, it can be seen that for a particular instrument the calculated parameter has different values. Brightness is not stab1e within the frequency range of a given instrument. C1ear1y, the greatest values were reached for 1ow-pitched trombone sounds (Fig. 4.9c). On the other band, the smallest values occurred for the who1e chromatic scale of the flute (Fig.

102 CHAP1ER4

4.9h). Although this parameter is sensitive both to the type of instrument and to the pitch, it is also in a way characteristic for an individual group of instruments. However, it should be remernbered that since amplitudes of harmonics (along with sound envelopes) vary for different sounds of various instruments, irregularity among parameter values is unavoidable.

a. ~rigt.hn._ss

14

12

10

viola

-:!---"1""0--;:;20:----;:3::-0 -- Jlote._):lo

c. Brigthn&SI

e. llrigt.hnc.ss

14

g. Brigtbnus

14

12

trombone

bassoon

10

1o clarinet

~~

b.

Brigt.hne.ss

14

12

10 violin

-!----1,-,0----,2-0-----,3..,-0 --4~0 Rot&_Ro

d.

f. Br.igt.bn&.ss

14

12

10

oboe

..L.-"....--__..,.,10,-----,1.".5 -2:-::0-..."2::-5 --:a"'o- Hote_>1o

h. Brigtbne.ss

14

12

10 flute

-!--"-,,.,0~1:-.5--;2:-;:0.........,2~5.........,3;:;:0--:;3;.-5- Hoh._ßo

Fig. 4.9. Spectral centroids (Brightness) of exemplary musical instruments: a. violin, b. viola, c. trombone, d. trumpet, e. bassoon, f. oboe, g. clarinet, h. flute.


To detennine whether the computed parameters can be treated as distinctive features, a statistical tool was used in the study in the form of the Behrens-Fisher statistics (V) (see Eq. 3.112) [109][129]. Testing of this kind which employs the Behrens-Fisher statistics has been implemented in previous studies and presented in various publications [111][129]. The choice ofthe Behrens-Fisher statistics for musical sound analysis was determined by the fact that the compared sets may consist of a different number of elements, such as in the case where comparing the musical scale of a particular instrument. The basic assumption is that of mean equality in two normally distributed populations.

The value V can then be compared to a respective boundary value from statistical tables under an assumed significance Ievel. This requires the computation of the statistical parameter c from the formula:

(4.1)

which retlects the distribution of the V statistics and the determination of the boundary value.

The V statistics, computed for each parameter separately, can also be treated as a measure of the distance between the compared classes in the analyzed space of parameters. lt is this intetpretation that has been applied in the described experiments. For the assumed and fixed cardinalities n and m, it is possible to compare the values computed for various parameters related to the investigated populations.

In Fig. 4.10(a,b), two plots are shown. They allow one to observe the distribution of four polynomial parameters ( coefficients a1,, see Section 3 .1.3 ), namely a1 expressed in [dB/octave] vs. [dB/octave2], and [dB/octave3] vs. [dB/octave4], for the sounds of four pipe organ voices (Fig. 4.10a,b). Notice that these parameters are partially separable, however, their statistical discemibility is not satisfying.

a. [ dB/octave J

0

b. [ dBioctave3)

-12 12

14 J::~::~r::~:t:::::]::~:::::~Ir::::J::~::~::::~r:::::r::~::l:::~l~:::l:~: t:::~t:::t::~::t::::t:::r::r::J~::!t:::::t::::t::::~r::::t::::: _ _ ;~- *' -~-) ____ -i _____ L ) __ -..!-------~- _____ J _____ J_ -----~ ... _

l ; ! : +! ! ; : ~ __ +j_ ... L -- : ..... .J ... ~ ... .J . .J -- ... L----~- ... L .. l' ·- t .

- 8

:. __ :::,! .. : ........ •.: __ .: __ :_.::···:·[1··_:_-•. ·_::!···.·· ... · .• -_-.•.·.D•-·.:.-_-:··_·;·· .... ,:.: •. • .. : •. _•._" __ --.· •.• : .. _~ ..... ;····::.o .. :. __ :~_;.·····:•~--.-.;J]:;·_.·'.'·:_·.•----.-_··Jn_~.· ___ :_ .. -.·.::·_:···.:_+:=-·.· __ --_--.-~.·_.::·:,[;,··· .... ::: .•.. :.-:--~-·.--~~:·:··.~·· .. ·.· ___ ·_:_~•-.:_. __ :_-_:-_._•-.--9_.:: .. ; ...... :,·-~:_.·.·.:·~·--_::_~ .. •.::.:::,·.·· .. :.· .. • ___ :_-_::-_··_•.•.::_-.. ,···':•·.··.·.··-·-··-:··-:._-_-.··-::~:',· .. ,•--.·:_.·_-_•.::_-.: •... _·: .. ::·:··.···.···-:···: __ :_ .• -.· :?_Qr:t:r~tf·i;-:i~·:·::·::Fr:f'" • : 0 :~· ~- V ..• Jt:tftt1~M~t'f~l

( dBioctave2] ( dBiocta~~e~

D - Prlnclpal 8' 'il - Qulnladena 8'

+ - Subbas 16' X - Trumpet 8'

Fig. 4.10. Distribution of four polynomial parameters: [dB/octave] vs. [dB/octave2] (a), and [dB/octave3] vs. [dB/octave4] (b)

104 CHAP1ER4

Most parameters were tested in the way described and shown above. It should be pointed out that it is not possib1e to formulate decisive conclusions as to the signifi.cance of any one particular parameter. Therefore, the feature vector that is described in the next section is multidimensional

4.2. Feature Vector Extraction

As was already mentioned, the rationale for musical sound value parametrization is the huge amount of data associated with digital musical signal samples. This process considerably reduces the amount of data and results in a set of parameters. The parameters that were extracted from musical sounds for the pmpose of this study can be divided into two main groups: those derived from time domain characteristics and those based on the spectral domain. When the vectors of parameters for one instrument are grouped together as one class, the matrix-like organization is then easy to use where dealing with leaming algorithmbased systems.

The starting point of this work was the creation of a database on musical sounds. Initial1y, the data used in the constructed musical signal database were obtained on the basis of sounds recorded on CD's which were edited at McGill University [111][129]. Complete chromatic scales from the standard playing range of essentially al1 non-percussive instruments of the modern orchestra are included in these recordings. Additionally, FFf -based data, containing information about each note from the chromatic scale which is characteristic of a chosen instrument, was used in parameter extraction [111][181][182]. This database, originated also from McGill University CD's, named SHARC by its author [181], contains information about the spectrum of 24 orchestral instruments, some of them using different articulations, thus giving a total 39 instrument examp1es. The data about the instruments contain such information as the pitch of a note (in accordance to the Acoustical Society of America standards), the note number, the maximum amplitude value of the samp1es used in the analysis, the nominal fimdamental frequency with reference to equal-tempered tuning, the frequency measured for the signal sample, the total duration of a performed note (in seconds), the starting point from which the analysiswas taken (relative to the onset ofthe note), and the centroid of the spectrum. Amplitudes and phases of subsequent harmonics are given in reference to the fundamental component More detai1s about this database may be found in Iiterature [ 181].

As CD's recorded at McGill University contained only single examples of musical instriunent sounds, hence already created databases were modified and completed with sounds recorded at the Sound Engineering Department of the Technical University of Gdalisk [132).


4.2.1. Multimedia Database

For the purpose of the experiments which are presented within the frame of this study, a multimedia database ofmusical instrumentswas engineered [132].

The process of creating a multimedia system may be realized according to the following phases: problern analysis, specification of user requirements, constmction, implementation, testing, exploitation and modifications [132]. All the above mentioned phases resulted in engineering a multimedia database that encompasses both woodwind and string group of instruments [132]. The database was constructed using the DELPHI system. It was also assumed that some links to HTML pages would be needed. Also, additional :functions were provided, such as:

- possibility to create charts, to save them to a file and to read them from the file using OLE 2.0 (Object Linking and Embedding) teclmology;

- possibility to save and to read created report; - possibility to create SQL text; - possibility to save and to read the SQL help text.

The constructed database is of the relational type. The main key in the system is the instrument identifier. There are n charts related to a given instrument. Parameters are identified by a composite key containing instrument identifier and sound name. There is a relationship 1 to n between the INSTRUMENTS and PARAMETERS, i.e. every instrument is related to n series of parameter values (n depending on the number of sounds associated with a musical instrument). Within the PARAMETERS every entity may be illustrated by a sound sample andlor sound time, frequency-domain charts [132].

The engineered multimedia database encompasses: basic information on musical instruments including playing teclmiques (differentiated articulation); description of parameters; images; sound samples within the instrument musical scale and exemplary musical phrases played by a given instrument, timefrequency-domain representation of sounds for the who1e instrument musical scale; tables containing the sound parameter values (also for the who1e instrument musical scale); program help; possibility to create various kinds of charts and printed reports for a11 parameters, SQL-based query; descriptive information concerning the SQL-based help; selected HTML link pages.

In Fig. 4.11 the presentation of a chosen musical instrument is shown. Additionally, in Fig. 4.12 some exemplary screen shots showing musical sound time-frequency-domain representation, separability charts, chosen parameter values, SQL-based query and created report are given.

106 CHAPTER4

Fig. 4 .11. Presentation of a musical instrument in the engineered multimedia musical database

a.

(Legend to Fig. 4.12, see page 108)


c.

e.

(Legend to Fig. 4.12, see next page)

108 CHAPTER4

f.

h.

-~'.{; ..... 1.1 1... .....

Fig. 4.12. Exemplazy screen shots of the engiDeered database: musical sound time-frequency-domain representation (a,b), separability charts (c,d), chosen parameter values ( e ) , SQL-based query (f), report creator (g), created report (h)


4.2.2. Parameter Extraction

The starting point in the analysis phase was the selection of a short fragment corresponding to the starting transient and sound steady-state portion for each note. Next, editing and analyses using the Short Time Fourier Transform were performed. The STFT analyses were done for 1024 samp1e frames with 700 samples overlap. Digital16-bit stereo recordings at 44.1 kHz sampling frequency were used, and the Hamming window was applied to analyses, as pointed out in the previous chapter. Subsequently, calculation of parameters was initiated.

The feature vector, therefore, consists of 14 parameters: spectral parameters calculated on the basis of the FFT representations contained in the database (as defined in Section 3.1 ), and time-related parameters (P 1 to P s) which are extracted on the basis of edited sound attack and steady-state phases. For the purpose of an automatic derivation of parameters, programs written by the author under MA THEMA TICA system were engineered, examples of which were presented in Section 4.1.

As is shown below, parameters included in the feature vector may be divided into groups associated with the fundamental, mid and high frequency components and their relationships, as weil as those associated with other spectral properties, such as odd or even harmonic content or a parameter that is related to the subjective perception of a sound, namely brightness:

Parameters with regard to the fundamental:

- rising time of the first harmonic, expressed in periods denoted as P 1 ;

- energy ~f the first harmonic, calculated for the steady state (T1 );

- T1 at the end ofthe attack divided by T1 for the steady-state, denoted as Pz;

Parameters with regard to the mid frequency partials:

- rising time of II, III and IV harmonics, expressed in periods (P 3 );

- energy of II, III and IV harmonics, calculated for the steady state (Tz); - Tz at the end ofthe attack divided by Tz for the steady-state (P4 );

Parameters connected to high frequency partials:

- rising time of the remaining harmonics, expressed in periods (P 5 );

- energy ofthe remaining harmonics, calculated for the steady state (T3 );

- T3 at the and ofthe attack divided by T3 for the steady-state (P6 );

110 CHAPTER4

Parameters describing relationships between fundamental, mid and high frequency partials in terms of time de/ays:

- delay of li, m and IV hannonics with relation to the fundamental during the attack (P7 );

- delay of the remaining hannonics with relation to the fundamental during the attack (Pa);

Parameters connected with even/odd spectral properties:

- content of even hannonics in the spectrum (h,v); - content of odd hannonics (hodd);

Brightness of the sound (8),

Normalized frequency of the sound (PJ:

Pt= i/l,

where: I- nurober ofnotes (sounds) available for a parametrized instrument, inumber of the parametrized sound; sounds are numbered from 1 to I.

Parameter Pt, depending on the sound pitch, does not allow distinction between instruments, but it does show the position of a sound within the musical range of a parametrized instrument This is important because timbre changes within the chromatic scale of different instruments. Parameters connected to high frequency partials ( especially B) depend on pitch because the higher the sound, the smaller the number of its hannonics in the spectrum. This is obvious since the analysis range is always the same, whereas the fundamental frequency is increasing and the higher frequency partials start to exceed the chosen analysis range.

The consecutive parameters are grouped into a feature vector as shown in Tab. 4.1.

Tab. 4.1. Fonnat ofthe feature vectors

4.3. Statistical Properties of Musical Data

Since the correlation is usually understood as a measure of data similarity, thus this criterion may be used in parameter redundancy testing (see Section 3.3.3). Forthis pmpose scripts written for the MATIIEMATICA system under UNIX were prepared. The correlation procedure is shown in Fig. 4.13.


(* Correlation Calculations *)

cross[list1 _, list2 J :=Module[ { i,n,crossnumerat,crossdenumeratl, crossdenumerat2,crossdenumerat, crosscor, setl, set2},

n=Length[list1 ]; setl =listl; set2 = list2; crossnumerat=O; For[i=l, i<=n, i++, crossnumerat=crossnumerat+(set1[[i]]-Mean[set1])*(set2[[i]]-Mean[set2]) ]; crossdenumeratl =0; For[i=1, i<=n, i++, crossdenumeratl =crossdenumerat 1 +(( set 1 [[i ]]-Mean[ setl ])"2) ]; crossdenumerat2=0; For[i=1, i<=n, i++, crossdenumerat2=crossdenumerat2+( ( set2[[i]]-Mean[ set2 ])"2) ]; crossdenumerat=Sqrt[ crossdenumerat1 *crossdenumerat2 ]; crosscor=crossnumeratlcrossdenumerat;

Fig. 4.13. Procedure realizing correlation calculations under MATHEMATICA system

Values of calculated correlation r (Eq. 3.99) and corresponding values of Student's t (Eq. 3.102) are shown respectively for two selected instruments: contrahass clarinet- Tab. 4.2-4.3 and bass trombone- Tab. 4.4-45.

Tab. 4.2. Correlation coefficients r calculated for a contrahass clarinet

r Pt T2 T3 pl ..... B hodd hev Pt 1 ..... T2 -0.274 1 ..... T3 -0.715 -0.407 1 ..... pl 0.365 -0.147 -0.168 1 ..... ..... ..... ..... ..... . .... . .... B -0.945 0.316 0.637 -0.371 1

hodd 0.778 0.115 0.656 -0.331 0.825 1

hev -0.318 0.102 0.295 0.012 0.259 0.161 1

Tab. 4.3. Student's statistics t calculated foracontrahass clarinet (Cont. on page 112)

r Pt T2 T3 pl ..... B hodd hev

Pt -- .....

T2 -1.303 ----- .....

T3 -4.691 -2.041 --- .....

112 CHAPTER4

PI 1.799 -0.68I -0.779 --- ..... ..... ..... ····· ..... . .... . .... B -13.176 1.524 3.782 -1.830 -----hodd -5.683 0.53I 3.982 -1.606 6.702 -----hev -1.540 0.468 1.4I5 0.055 I.228 0.747 -----

Tab. 4.4. Correlation coefficients r calculated for a hass tramhone

r Pt T2 T3 PI B hodd hev Pt I

T2 0.911 I

T3 -0.908 -0.999 I

PI -0.372 -0.423 0.4I5 I

..... ..... . .... . .... . .... . .... B -0.958 -0.856 0.849 0.329 I

hodd -0.247 -0.240 0.23I 0.485 0.256 I

hev -0.02I -0.059 0.072 -0.387 -O.OI6 -0.943 I

Tab. 4.5. Student's statistics t calculated forahass tramhone

r Pt T2 T3 PI B hodd hev Pt -----T2 10.592 ---T3 -10.375 139.713 -----PI -1.92I 2.24I 2.189 _ .........

..... ..... .. ... ..... .. ... . .... B -15.938 -7.928 7.708 I.668 -----hodd -1.224 -1.184 1.138 2.662 1.27I -----hev -0.103 -0.284 0.347 -2.0I1 -0.076 13.60 -----

The significance of correlation between pairs of parameter values was checked according to expression (3.103). It was found that pairs ofparameters: PrT3, Pt-B, Pt-hodtJ, T3-B, T rh0 a,;, B-hoda (values of correlation r and statistics t highlighted in a bold font in Tab. 4.2-4.3) for a contrabass clarinet are strongly correlated (at both significance Ievels equal to 0.01 or 0.05). On the other hand, in the case of bass trombone there exist also strong correlations between pairs of parameters: Pr T2,

PrTJ, PrB, T2-T3, TrB, TrB, h.v-hoaa (Tab. 4.4-4.5). As is seen, in some cases the parameter dependency d.iffers for these two selected musical instruments. On the basis of similar analyses performed for other instruments from the database, it may be said that existed parameter dependencies express the instrument individual character. For example, in the case of a bass trombone, there is a strong correlation between parameters T2 and T3 or B (Brightness) and parameters expressing higher order harmonics. It is because, the spectrum shape of this instrument is bell-like, and it is characterized by a high value of brightness (spectral centroid).

AUfOMATIC CLASSIFICATION OF MUSICAL INSTRUMENT SOUNDS 113

4.3.1. Separability of Original Data Values

In this paragraph, statistical measures, such as mean and variance values of musical data, will be shown. Additionally, the separability of parameter values will be checked using the Behrens-Fisher statistics. As was pointed out in the previous chapter, the Behrens-Fisher statistics is a useful tool for checking the distinctiveness of two classes. It was therefore applied to parameters included in the feature vector. The greater the absolute value of this statistics IV1 for the selected parameter for a chosen pair of instruments, the easier to distinguish between these instruments on the basisoftbis parameter. As IV1 depends on mean values, variances and nurober of examples for each instrument, this implies that instruments will be discernible on the basis of the selected parameter if their mean values are definitely different, variances are small and examples are numerous. Exemplary comparisons of mean values, dispersions, and the Behrens-Fisher statistics absolute values for the selected instruments are shown in Tab. 4.6- Tab. 4.9.

Tab. 4.6. Comparison of mean values, dispersions and the Behrens-Fisher statistics absolute values I V1 for particular steady-state parameters of the pair of instruments X and Y (hass trombone and contrahass clarinet)

Parameter Pt T2 T3 B hev hodd hass trombone X 0.520 0.213 0.777 12.994 0.701 0.705

contrahass clarinet Y 0.522 0.228 0.455 12.972 0.793 0.213

hass trombone S t 0.288 0.201 0.214 6.137 0.030 0.030

contrahass clarinet Si 0.288 0.134 0.198 4.227 0.112 0.071

IV! 0.020 0.305 5.315 0.014 3.311 29.034

Tab. 4. 7. Comparison of mean values, dispersions and the Behrens-Fisher statistics absolute values I V1 for particular steady-state parameters of the pair of instruments X and Y ( oboe and bassoon)

Parameter Pt T2 T3 B hev hodd

oboe X 0.516 0.550 0.089 2.937 0.335 0.621

bassoon Y 0.516 0.643 0.265 5.037 0.540 0.718

oboe St 0.289 0.294 0.173 1.206 0.246 0.258

bassoon Si 0.289 0.311 0.327 2.850 0.276 0.157

IV1 0 1.121 2.656 3.779 2.781 1.795

Tab. 4.8. Comparison ofmean values, dispersions and the Behrens-Fisher statistics absolute values I V1 for particular attack parameters of the pair of instruments X and Y (hass trombone and contrahass clarinet)

Parameter hass trombone X

114 CHAPTER4

contrahass clarinet Y 0.199 1.852 0.170 1.118 0.152 0.359 0.044 0.020

hass trombone S t 0.118 2.127 0.117 1.564 0.102 0.335 0.146 0.248

contrahass clarinet s; 0.049 1.844 0.072 1.033 0.082 0.199 0.139 0.132

lVI 1.351 1.023 0.444 1.034 0.582 0.109 2.241 2.277

Table 4.9. Camparisan ofmeans, dispersions and Behrens-Fisher statistics absolute values I VI for particular attack parameters of the pair of instruments X and Y ( oboe and bassoon)

Parameter pl p2 p3 p4 Ps p6 p7 Pa oboe X 0.475 0.732 0.413 0.687 0.710 0.718 -.006 0.023

bassoon Y 0.186 2.230 0.198 0.946 0.366 0.211 0.015 0.128

oboe St 0.253 0.190 0.233 0.225 0.923 0.428 0.142 0.204

bassoon s; 0.091 1.503 0.117 0.799 0.284 0.122 0.128 0.176

IVl 5.990 5.508 4.576 1.742 1.980 6.343 0.599 2.185

The first two tables contain results for the steady-state parameters, while the next tables contain results for the time-related parameters. In Tab. 4.6 and 4.8, results for the bass trombone and contrahass clarinet sounds are shown. These instruments have similar musical ranges, but belang to different groups of instruments: single-reed woodwinds (contrabass clarinet) and brass (bass trombone). In contrast, results for the oboe and bassoon sounds, instruments which belang to the same group, namely: double-reed woodwinds, are shown in Tab. 4.7 and4.9.

From this exemplary analysis it is seen that any single parameter would not be sufficient to distinguish between all instruments. As is seen in the tab1es, the biggest value of statistics IVl was obtained for the hodd parameter for the contrahass clarinet and bass trombone pair. On the other hand, time-re1ated parameters are a good basis to distinguish between oboe and bassoon sounds.

4.3.2. Separability of Discretized Data

Using the same technique as previous1y described, the separability of data after the discretization procedure was also tested. However, in order to check whether and how the discretization process influences the parameter va1ue separation, camparisans of the original data (before the discretization process) and the discretized data are shown in the same figures.

In Fig. 4.14 and 4.15, values of the previous1y defined parameters, name1y Brightness (B) and hodd , are shown for the trombone before and after the

discretization process has been performed. As is seen from Fig. 4.14, the character of the Brightness parameter is not changed after the discretization process. On the

other hand, the spreading of the hodd parameter values Iooks differently (Fig.

4.15). In the next figures, it is also seen that the distribution of parameter values


after the discretization process may be changed. Discretization not only transforms real-value parameters into integer ones, but also may Iead to a decidedly different layout of parameter values. Fig. 4.16 illustrates the distribution of parameter values forapair ofinstrwnents (oboe and bassoon). As is seen from the figure, the discretization process usually changes the discernibility of parameters and may even decrease it.

.. " ., ., .. " s

.c: • ·~ 111 •

notenurober

&'! VSQ , threshold•3

• VSQ , threshold:4

1:1 MGCM, a=d•0.1, b:c:e•0.05

• Before dlscretizatlon

Fig. 4.14. Parameter Brightness before and after the discretization process

notenurober

r:::J MGCM , a•d•O .t , b•c•e•O. OS

IIIVSQ , threshold•4

Fig. 4.15. Parameter h odd before and after the discretization process

There are other ways to analyze distribution of parameter values. Various techniques, namely criteria based on Euclidean distance, squared Euclidean distance (so-called Manhattan metrics), Hausdorff metrics, "customized" metrics, criterion based on a Pearson's correlation coefficient r, etc. are used for this purpose.

Apart from the criterion based on Behrens-Fisher statistics, another example of such analyses will be defined below:

116 CHAPTER 4

minL .. . . l,J c = l,J (4.2) maxi;

i

where: L;J- distances between classes i andj, i t:. j, I; - characteristic distance for the class i, calculated as maximum

Euclidean distance between objects.

a. b. Pt

c. d.

c. VSQ, threshold =0.8 d. MGCM, a=c=0.5, b=d=0.02, e=O.l

Fig. 4.16. Parameter values for P2 and P8 before discretization (a), after the EIWM (b ), VSQ ( c) and MGCM ( d) discretization for an oboe and a bassoon.

Distance L;J was calculated using the Hausdorff metrics, defined as below:

h(X,Y) = max{supinf d(x,y),supinf d(x,y)} (4.3) xEX yEY xEY yEX

where X, Y - sets of parameter values, d - Euclidean distance.

AUTOMATIC CLASSIFICATION OF MUSICAL INS1RUMENT SOUNDS 117

Hausdorff metrics can still be positive even if sets X and Y overlap. If C> 1, the classes are fully separable. The described criterion of discernibility ( C) is very strong and rarely can be fulfilled.

The above defined criterion was used for the purpose of comparison of the distribution of data before and after the quantization process. In Tab. 4.10, exemp1ary results are shown. As is seen, the discretization process s1ightly diminishes the separability of data in the presented case. Neverthe1ess, in subsequent paragraphs, it will be seen that although some parameters are not statistical1y separab1e, 1earning algorithms may still be trained with them and as a result may classify objects with great accuracy.

Tab. 4.10. Criteria of discemibility for the original data and the data after quantization using various methods

Data M c Original data 7.58 0.714316 EIWM quantization, 5 intervals 6.36 0.685346 MGCM C1usterization 6.31 0.543324 a=0.05 b=0.02, c=0.05, d=0.01, e=O.Ol MGCM C1usterization 6.18 0.694559 a=0.02, b=0.01, c=0.02, d=0.01, e=0.005

where the M value is based on the Behrens-Fisher statistics V defined previous1y in Eq. (3.112):

M = min(maxjV(X;,X1,p) I) l,J p

(4.4)

where V is applied to parameter p for the pair of instruments X and Y.

4.4. Neural Network as a Classifier of Musical Instruments

The goal of the experiments was to study the possibility of identifying se1ected classes of instruments by a neural network in order to verify the effectiveness of the extracted parameters of sounds. A multi-1ayer neural network of the feedforward type was used in the experiments. The number of neurons in the initial1ayer was equal to the number of e1ements of the feature vector. In turn, each neuron in the output 1ayer was matched to a different class of the instrument and so their number was equa1 to the number of classes of instruments used in the experiment. The number of hidden neurons was arbitrari1y adopted as 15. The error back-propagation (EBP) method based on the de1ta 1earning rule was used in the experiment [122].

The training of the neural network was carried out using the EBP method several times. Bach time different initial conditions were adopted as well as

118 CHAP1ER4

training parameters: the training process constant (TJ) and the momentum term (a) were changed dynamical1y in the course of the training. They were used 1ater to evaluate the progress of the training process. Additional1y the number of iterations was obsetved necessary to make the value of the cumulative error drop be1ow the assumed thresho1d value.

4.4.1. Training Procedures

Training Processes on the FOURTEEN Type Feature Vectors

The training of the network and its testing was carried out on the basis of the feature vector described previous1y in Sections 3.1 and 4.2, see also Tab. 4.1. Thesefeature vectors aretobe cal1ed further on in this chapter as FOURTEEN type (Tab. 4.11).

Tab. 4.11. Format ofthe feature vectors

To train the neural network, parameters vectors of four c1asses of instruments were se1ected: bass trombone, trombone, English horn, contrabassoon. In general, 2 types of sets were formed: one encompassed al1 parameters vectors (type ALL), whi1e the other one contained about 70% of al1 vectors (type 70 _PC). The vectors that were included in the set type 70 _PC were chosen at random, however, it was attempted to maintain a uniform distribution. Be1ow in Tab. 4.12 the number of parameters vectors for the given class of instruments in the training · set type 70 _PC in regard to the size of the class is shown. Additionally, this re1ation is presented in per cent, and also indexes of vectors that were excluded from the set type 70 _PC are shown.

Tab. 4.12. Representation ofthe trainingsettype 70 _PC

Instrument Size of the Size of the class Vectors exc1uded from the class 70_PCin [%] settype 70 _PC 70 PC

bass trombone 18/25 72% 2, 7,10,14,18,21 ,23

trombone 22/32 68.75% 1,4,7,1 0,15,18,22,26,29,30

English horn 21/30 70% 3,5,8, 12, 16, 19,22,27,30

contrabassoon 22/32 68.75% 3,6,8,11, 14, 19,22,25,28,31

The training was proceeding up to the moment when the value of the cumulative error dropped be1ow 0.01. This value was adopted arbitrarily in order to obsetve a possib1e case of network over-training. Three network training processes were conducted. A diagram of the training phase is presented in Fig.


4.17. The adopted descriptions have the following respective meanings: variables range_V and range_W give information on the range of values of elements of matrix V and W. Matrices V and Ware sets of synaptic weights respectively: from the input layer toward the hidden one and from the hidden toward the output layer. In the first case (1}, matrices of network weights were initiated at random, however, with values not exceeding the range of (-0.2, 0.2), while in the second case (2) this range decreased to values within ( -0.1, 0.1).

The diagram in Fig. 4.17, marked as (3) has a range of random weight initialization identical to diagram (2). However, these routines differ from one another because the values of the weights during random initialization are different each time. The purpose of such training procedures (2) and (3) is to compare the process of training convergence within the same type of a training set.

TRA1NING SET

~,o_,_woor ~~e-_-v-=~-an-ge-_w-=o-.1-.3 lO_PC lO_PC lO_PC

Fig. 4.17. Diagram ofthe training phase

Training Process No. 1

Since a stereo sound constituted the basis for calculating the parameters of musical sounds, so the same parameters were calculated separately for the left and the right channels. However, the testing phase will be presented only on the left sound channel.

For the training set (LEFI'.1_70PC), the following initial conditions were adopted: unipolar activation function of the neuron, random initialization of values of elements of matrices V and W ranging from -0.2 to 0.2, training with the momentum method applied, 77 = 0.05, and a = 0.45. In this training routine the network converged quickly to the level of error of 0.07 - 0.06. Further growth of required accuracy (decreasing the assigned threshold value of error) caused a drastic prolongation of the training period due to the small value of the training coefficient. lt is worth emphasizing that in the proximity ofthe errorvalue at 0.02 and 0.01 the term 77 was increased many times, causing the previously mentioned high error oscillations, and, finally attainment of required accuracy.


The following initial conditions were adopted in the case ofthe LEFr.2_70PC training set: unipolar activation function of the neuron, random initialization of va1ues of elements of weight matrices covering the range ( -0.1, 0.1 }, training with

120 CHAP1ER4

the momentum term, TJ = 0.05, a = 0.4. In Fig. 4.18 the dynamic change of training parameters is presented Additional1y, in Fig. 4.19 the convergence of the training process is shown.

1J,.m

0,5

G,4

o,3

0,2

0,1

0 1111 IB _BII

o ZB 3flTT .cm G11 IIBIB t!ll1fiT IIID10 'MI2I! 111112! qcle none 7,CXM o,a112 o,cae o,cmn o,C814 O.C814 o,az 0,0101 o,0104 E

Fig. 4.18. Dynamic change oftraining parameters

~~·

1::0 I~---t--~.-a.....r---~~~ ~ I -- I .. I .. I .. I II I II I II I II I

0,1 0,09 0,18 o,rn O,OB 0,06 0,04 0,03 0,02 0,01 E miiX

Fig. 4.19. Convergence of the training process.

The data from Fig. 4.18 and 4.19 show clear1y that the training process was sharply stopped because the value of admissible error was decreased below 0.05. Initially the training proceeding very rapidly and attained the assigned boundary error of 0.1 - 0.7 within only several thousand of iterations. As the accuracy of training was increased, the nurober of necessary iterations was growing. This was due to the fact that the speed oftraining TJ was very low (-0.005) and at the same time the momentum tenn a was reaching a high value (-0.5). Close to the value of the error of 0.02, the value of TJ was increased ten times to evoke higher error oscillations. The result was sucl_l as that after about 250 iterations the accuracy of the training dropped below 0.02. On the other band, close to the va1ue of 0.01 the speed of trainingwas reduced twice (0.01 -> 0.005) in order to reduce the error generated and to allow going be1ow the boundary value of 0.01. However, this did not succeed, the error generated increased and only by evoking higher error oscillations ( TJ was increased twenty times) was the training terminated.


The following initial conditions were adopted in the case of the LEFT.3 _70PC training set: unipolar activation function of the neuron, random initialization of values of weight matrices covering the range (-0.1, 0.1), training with the


momentum tenn, 17 = 0.05, a = 0.5. Despite initialfast convergence of network training, the value of parameter 17 was reduced to 0.003 at app. 0.08 error. This small value excluded a great magnitude of error oscillations during the training, but again this happened at the expense of the training speed. It was also tested if this parameter could be increased, but it turned out that the training process in this case was unstable. lt was only at 0.02 accuracy that 17 could be increased several times, which decisively speeded up the final termination of the training.

Training Processes Basedon ELEVEN and FOURTEEN Type Feature Vectors

The purpose of another experiment was to check the influence of the length of a feature vector on the classification effectiveness [136]. In this case, both 14 and 11 element feature vectors were used. The latter format includes time-related parameters, excluding following parameters: B, hodd. h • ., calculated on the basis of the musical sound frequency representation. The format of the vector ELEVEN is shown in Tab. 4.13. It should be noticed, that instruments used in these experiments differ mostly in musical range, which seems a more difficult task than testing neural networks on instrumentsbetonging to different groups.

Tab. 4 .13. Format of the ELEJiEN type feature vectors

I~ I~ I~ I~ I~ I~ I~ I~ I~ I~ I~

The experiment consisted of two training processes. Both of them are based on training sets of the type 50_PC (50% of all ELEVEN and FOURTEEN feature vectors). The vectors included in sets type 50_PC were chosen at random or every second vector was taken from the database. These processes are denoted as RANDOM and UNIFORM, respectively. Two schema of network training processes were adopted, namely: Schema_] and Schema_2 (see Fig. 4.20). Accordingly, at the same initial conditions, networks were trained and tested four times (POSITIVE procedure). Then, the training sets become testing ones, and vice versa (NEGATIVEprocedure). The whole process is given in Fig. 4.20.

ELEVEN/FOURTEEN FEATIJRE VECTORS

Fig. 4.20. Schema ofthe training process

122 CHAPTER4

Below, in Tab. 4.14, the munber ofparameters for NEGATIVE and POSITIVE procedures is shown.

Tab. 4.14. Representation of1rainin_g sets for NEGATIVE and POSITIVE procedures

Procedure POSITIVE NEGATIVE Size ofthe Size ofthe Size ofthe Size ofthe

INSTRUMENT class class in r%1 class class in[%] clarinet Bflat 18/37 48,65% 19/37 51,35% clarinet E .flat 16/32 50% 16/32 50% bass clarinet 12/25 48% 13/25 52% contrahass c/arinet 11/23 47,83% 12/23 52,17%

Training Phase

For the training phase, the same initial conditions were adopted as previously introduced. As there are no formal rules as to the admissible value of cumulative error, therefore 6 values were adopted arbitrarily: 0.05, 0.04, 0.03, 0.02, 0.01, 0.005, in a result 6 sets of weights were obtained. After the training process, 96 neural networks trained on the ELEVEN type vectors and 96 on FOURTEEN ones were at disposal for the testing phase.

4.4.2. Recognition Experiments

Testing Phase Basedon the FOURTEEN Type Feature Vectors

In the testing phase the first aim was to test the effectiveness re1ation between identifying new objects by the network as it relates to network training accuracy. It is worth observing that the effectiveness of the network does not determine the quality of the trained network. Good quality can be understood as a feature of the network that causes the kth neuron at the output to generate a high value in relation to the values of outputs ofthe remairring neurons (e.g. 0.8 to -0.005) for a given vector at the input.

In order to determine the recognition quality, al1 outputs of neurons were observed when the vectors from the kth class were being presented The values of outputs of neurons were treated as deviations from the expected value of 0. Variance can then be a measure of the quality of the trained network. The bigger the variance calcu1ated for particular outputs of neurons is, the stronger the classifications for particular classes are. This parameter is computed on the basis of the following formula:

AliTOMATIC CLASSIFICATION OF MUSICAL INSTRUMENT SOUNDS 123

where:

Vark - value of variance for kth neuron, Nk - number of parameter vectors, members of the kth class and not present in

the training set, o;- output ofthe kth neuron for kth feature vector.

The above considerations will be shown on the basis of the test conducted on a settype LEFF_30PC. The recognition effectiveness (the number of correct and wrong responses expressed as percentages (correctlwrong [%])) for the chosen testing set is presented in Tab. 4.15. The visible change of recognition effectiveness happened upon changing the accuracy of the network from 0.1 to 0.09. Despite a further increase in accuracy, the effectiveness remained at the same Ievel - 97.22%, that is only one vector was wrongly classified. Upon the presentation of the vectors of the particular classes it can be observed that the quality of identifying new objects was slightly growing together with a reduction in the cumulative error Emax·

Tab. 4.15. Recognition effectiveness

hass trombone English contra- Total Score trombone horn bassoon

Emax correctl correct/ correct! correct! correct/ wronJ{ [%1 wronK [%1 wronK [%1 wronK [%1 wronJ{ [%]

0.1 100 70 100 20 66.44 0 30 0 80 30.56

0.09 100 100 100 90 97.22 0 0 0 10 2.78

0.07 100 100 100 90 97.22 0 0 0 10 2.78

0.05 100 100 100 90 97.22 0 0 0 10 2.78

0.03 100 100 100 90 97.22 0 0 0 10 2.78

0.01 100 100 100 90 97.22 0 0 0 10 2.78

Below, indexes of wrongly classified vectors are presented for this type of the testing set:

trombone - 1, 29, 30; Emax = 0.1, contrabassooll- 3, 8, 11, 14, 22, 25, 28, 31; Emax = 0.1 and 8 for Emax = (0.09- 0.01).

124 CHAPTER4

For the purpose of presenting the values of variances, two classes of instruments were se1ected, name1y: bass trombone (Tab. 4.16) and trombone (Tab. 4.17). The first of these instruments was identified with much better effectiveness than the other one.

Tab. 4.16. Variances in outputs of neurons upon presentation of vectors of the class bass trombone

Emax bass trombone trombone English horn contrabassoon 0.1 0.3597468 0.0060959 0.0026676 0.0606067 0.09 0.9647862 0.0027156 0.0000044 0.0003433 0.07 0.9672036 0.0025964 0.0000038 0.0002916 0.05 0.9721547 0.0023566 0.0000030 0.0002217 0.03 0.9778430 0.0022333 0.0000021 0.0001452 0.01 0.9856030 0.0032118 0.0000008 0.0000564

Tab. 4.17. Variances in outputs of neurons upon presentation of vectors of the class trombone

Emax bass trombone trombone English horn contrabassoon 0.1 0.0059164 0.3968045 0.0432925 0.0353321 0.09 0.0000000 0.8179480 0.0014706 0.0222385 0.07 0.0000000 0.8193167 0.0016362 0.0204443 0.05 0.0000000 0.8198924 0.0019054 0.0180781 0.03 0.0000000 0.8206997 0.0022593 0.0148136 0.01 0.0000000 0.8223629 0.0029912 0.0090919

The best scores of recognition effectiveness for the particular training procedures of the test were compiled in Tab. 4.18. The consecutive co1umns signify: the test routine (name of the training and testing set), classification effectiveness expressed as percentages and numbers, and respective values of Emax· Tab. 4.18 shows that recognition effectiveness during the experiments was very high and was always above 90%. The number of unrecognized vectors was 1 or 2.

Tab. 4.18. Compilation of the best classifications

Classification Effectiveness Testroutine Testing set correct/ correct/ Emax

wrongl%} wronK LEFT.l _70PC LEFT_JOPC 97.22 35 (0.09- 0.01)

2.78 I LEFT.l _70PC RIGHT ALL 99.16 ill (0.09- 0.02)

0.84 I LEFT.2_70PC LEFT 30PC 97.22 35 0.02; 0.01

2.78 1 IEFT.2_70PC RIGHT_ßL 98.32 1l1 0.1; 0.09

1.68 2

AUTOMATie CLASSIFICATION OF MUSICAL INSTRUMENT SOUNDS 125

Classification Effectiveness Testroutine Testing set correct/ correct/ Em""

wrong f%1 wrong LEFT.3 70PC LEFT 30PC 94.44 34 (0.1 - 0.01)

5.56 2 LEFT.3 _70PC RIGHT ALL 98.32 117 (0.1- 0.07)

1.68 2

The results obtained show that only in some experiments (with various initial parameters of the training) were certain vectors not correctly identified. Hence, the presumption that data in these very vectors may be incorrectly acquired. lt should be remernbered that parametrized signals were sounds recorded in real conditions, i.e. as a patt of a musical performance. Therefore phenomena such as musical articulation or differentiated dynamic with al1 features specific for an individual musician are inc1uded in the signal and resulted in signal modulation, amplitude overshoots, etc. That may cause in some cases a certain kind of "non-adaptation" to the engineered algorithms in which only three mode1s of the relation between Attack-Decay-Sustain phases in a sound were assumed. What becomes evident is a way of testing the correctness of parametrization. If the wrongly classified vectors are always the same, then it is these vectors that should be subjected to verification.

Camparisans af Classificatian Effectiveness Basedan ELEVEN and FOURTEEN Type Feature Vectars

The testing phase aimed at 1ooking for the influence of the 1ength of the feature vector on the recognition scores. Exemp1ary results are shown in Tab. 4.19-4.22. The c1assification effectiveness is expressed as percentages and numbers of correct and wrong answers. The average of recognition score was obtained on the basis of results coming from all 24 networks be1onging to the given schema.

Tables 4.19-4.22 show that recognition effectiveness during the conducted experiments is very high. The number of recognized objects is between 77.41% and 94.59%, with an average of 84.14%. It should be noticed, that the increase of the vector 1ength from eleven to fourteen parameters causes also the increase of the recognition score (up to 14.25%). Additionally, the type of a trainingsetalso influences the recognition scores. For the RANDOM type schema of training, the effectiveness was between 77.41% and 80.34%, on the other hand for the UNIFORM schema of training, the obtained effectiveness was much higher (80.14% to 94.59%). The latter result is quite obvious, especially as all objects in the training set are ordered according to the Pt parameter (normalized frequency).

It should be remernbered that despite the fact that musical instnunents used in these test were very similar to each other, the neural network resolves well the classification tasks.

126 CHAPTER4

Tab. 4.19. Recognition scoresSchema_l, POSITIVE

Minimum score Average Maximum score Vector correct! correct/ correct/ correctl correct/ correct/

wrong wrong wrong wrong wrong wrong f%1 f%1 f%1

ELEVEN 75.00 45 77.41 46.45 78.33 47 25.00 15 22.59 13.55 21.67 13

FOURTEEN 75.00 45 80.14 48.08 86.67 52 25.00 15 19.86 11.92 13.33 8

Tab. 4.20. Recognition scoresSchema_l, NEGATIVE

Minimumscore Average Maximum score Vector correct/ correct/ correct/ correctl correct/ correct/


ELEVEN 77.19 44 78.07 45.00 78.95 45 22.82 13 21.93 12.50 21.05 12

FOURTEEN 89.47 21 92.18 52.54 94.74 54 10.53 6 7.82 4.46 5.26 3

Tab. 4.21. Recognition scores Schema_ 2, POSITIVE

Minimumscore Avera$?e Maximum score Vector correct/ correct/ correct/ correctl correct/ correct/


ELEVEN 80.00 48 81.67 49.00 83.33 50 20.00 12 18.33 11.00 16.67 10

FOURTEEN 81.67 49 88.75 53.25 90.00 54 18.33 11 12.25 6.75 10.00 6

Tab. 4.22. Recognition scores Schema_ 2, NEGATIVE

Minimum score Average Maximum score Vector correctl correctl correct/ correct/ correct/ correct!

wrong wrong wrong wrong wrong wrong [%1 [%] [%]

ELEVEN 75.44 43 80.34 45.79 82.46 47 24.56 14 19.66 11.21 17.54 10

FOURTEEN 92.98 53 94.59 53.92 96.49 54 7.02 4 5.41 3.08 3.51 3

The conducted research experiments show that the neural network performs well the task of identifying classes of musical instruments. The obvious advantage of this type of classifier is the fact that there is no need for quantization of values of parameters included in the vector which describes the musical sound. There is


no doubt that a certain disadvantage of this type of testing is a large amount of work needed to complete the training phase.

It should be remernbered that feature vectors included in the database encompass representations of consecutive sounds in the chromatic scale. In this case a high instability of designated parameters is observed, the additional element which affects the Iack of stability of parameters is the presence of non-linearity related to differentiated articulations and dynamics of musical sounds. However, in both cases the network ability to generalize allows a correct classification of the objects beingunder the test

4.5. Rough Set Decision System as a Classifier of Musical Instruments

A decision system based on rough set theory was engineered at the Technical University of Gdansk [48][49]. It includes learning and testing algorithms. During the first phase, rules are derived which become the basis for the second phase. The genemtion of decision rules starts from rules of length 1, continuing with the genemtion of rules of length 2, etc. The maximum rule length may be determined by the user. The system induces both possible and certain rules. lt is assumed that the rough set measure for possible rules should exceed the value 0.5. Moreover, only such rules that are preceded by some shorter rule operating on the same parameters are considered.

Additionally, the so-called neutral point (p) has an influence on the strength of rule (r), with the last one defined as [48]:

r= c(m- p) (4.6)

where: c - number of cases conforming the rule, f.lrs - the rough set measure, set by the user. In the system, a linear relation between the rough set measure and rule strength is set. A rough set measure of the rule describing concept X is the mtio of the number of all examples from concept X correctly described by the rule [36]:

(4.7)

where: X- is the concept, and Y- the set of examples described by the rule.

The next step is a testing phase in which the leave-one-out procedure is performed. During the jth experiment, the jth object is removed from every class contained in the database, the learning procedure is performed on the remaining

128 CHAPTER4

objects, and the result of the classi:fication of the omitted objects by the produced rules is saved.

4.5.1. Attribute Discretization

For the purpose of discretizing musical signal parameters, the following methods have so far been implemented at the Technical University of Gdansk [48][49][50][112][120][128]:

- Equal Interval Width Method (EIWM) - parameter domain is divided into intervals of the same width; the number of intervals is chosen by the experimenter;

- Variable Statistical Quantization- in the VSQ method, n discriminators dxy are assigned (where n is a Iimit of intervals specified by the user), and then discriminators with a Behrens-Fisher statistics value V smaller than the specified threshold are deleted;

- Maximum Gap Clusterization Method (MGCM) - the number of intervals n is also chosen by the experimenter; n maximal gaps between sorted parameter values are searched and the parameter domain is divided into intervals, choosing points from these gaps as division points of the parameter domain [112][130];

- Clusterization (STATCLUST) - based on the statistical parameters of distance

between pairs of neighboring parameter values; Og serves as a thresho1d

criterion, where. if the interval between neighboring parameter values is smaller

than Og then they are joined and make an interval [120];

- Method based on the Boolean reasoning approach proposed by Skowron and Nguyen [190]. The Boolean function is used as a tool to determine the best division points for each parameter domain [129].

The first mentioned method - Equal Interval Width Method (EIWM) - is the simplest one to perform However, this method neglects the distribution of parameter values., being non-linear in nature in the case of musical sound data. Both the VSQ and STATCLUST methods take the statistical properties of a set of data into account. The MGCM method allows a flexible choice of system parameter value dustering into intervals. The first four cited methods are local. The last mentioned discretization method (based on Boolean reasoning) is a global one. In this method, division points for the parameter domain are chosen in such a way that every divisionpointseparates as many classes (instruments) as possible. The quantization process does not have to divide every parameter domain. If the database contains a small number of classes and many parameters, some of them will not be quantized at all.


In every discretization method, a process of replacing the original values of input data with the number of an interval to which a selected parameter value belongs is carried out. Consequently, the representation of parameters by properly selected ranges instead of numbers is the essence of the above procedures. Such a conversion of parameter values into ranges results in memory savings during the learning phase.

4.5.2. Training Procedures

In the experiments, 15 classes were created which contained parameters of the following 20 instruments:

- soprano (15 sounds) and alto saxophone (14 sounds), - tenor (14 sounds), baritone (13 sounds) and bass saxophone (8 sounds), - oboe (32 sounds), - English horn (30 sounds), - B flat clarinet (37 sounds), - E flat clarinet (32 sounds), - bass (25 sounds) and contrabass clarinet (23 sounds), - bassoon (32 sounds), - contrabassoon (32 sounds), - C trumpet (34 sounds), -Bach trumpet (32 sounds), - French horn (37 sounds), - alto (13 sounds) and tenor trombone (36 sounds), - bass trombone (25 sounds), - tuba (31 sounds).

Below, some exemplary rules which were obtained for the musical timbre database are presented.

Exemp1ary rules: lf [P4 = 0] 1\ [P7 = 3] 1\ [h.v = 1] then [CLASS No. 5] lf [P3 = 0] 1\ [Pa= 2] 1\ [h.v = 3] then [CLASS No. 6] lf [P.s = 0] 1\ [P6 = 1] 1\ [h,v = 0] then [CLASS No. 7]

where: P3, ... ,Pa, h.v - attributes (parameters as defined in Section 4.2), for 15 classes (instruments); the discretization method applied was EIWM, with division into 5 intervals enumerated from 0 to 4.


The most important criterion of the discretization method is its accuracy rate, computed after finishing training-and-test procedures for every experiment. For the method used in these experiments, a recognition score of 81% was obtained,

130 CHAPTER4

assuming the neutral point equals 0.6, the rough set measure equals 0.7 and the rule length equals 3. Such accuracy has been obtained in experiments where all of the mentioned instruments were tested (see Tab. 4.23).

Tab. 4.23. Recognition scores [%] for the training sc::t containing 15 classes (EIWM: parameter domain division into 5 intervals; VSQ method: threshold value equals 1.4, maximum rule length produced equals 3)

Quantization Method 5 intervals 6 intervals 7 intervals EIWM, neutralpoint 0.6 81.5 81.7 79.4 EIWM, neutral po_int 0.3 79.6 79.9 82.6 VSQ, neutral point 0.6 88 91 89.1 VSQ, neutral point 0.3 78.2 77.8 79.4

In the next experiments some of the tested instruments were disregarded and four instruments, bass trombone, trombone, English hom, and contrabassoon, were taken into account

I. Bass trombone- frequency range <32.6Hz- 276.5Hz>; musical scale: cl -c#4,

II. Trombone- frequency range <58. 1Hz- 350.0Hz>; musical scale: a#l - f4, 111. English hom- frequency range <65.4Hz - 392.0Hz.>; musical scale: c2 -

g4, IV. Contrabassoon- frequency range <65.4Hz - 392.0Hz.>; musical scale: c2 -

g4.

Test results are included in Tab. 4.24.

Tab. 4.24. Recognition scores obtained for various system settings

Quantization Method . Rule Rough set measure Neutralpoint Recognition length Prs Score in[%]

EIWM/quant order =5 3 0.5 0.5/0.7 75/39 EIWM/quant. order =5 3 0.7 0.5/0.7 84/85 EIWM/quant order =5 4 0.5 0.5/0.7 71/42 EIWM/quant order =5 4 0.7 0.5/0.7 78/80 EIWM/quant order =7 3 0.7 0.5/0.7 74n5 VSQ/quant. order =7 3 0.5 0.5/0.7 78m VSQ/quant order =7 3 0.7 0.5/0.7 75n4

.As may be seen in Tab. 4.24, the overall recognition accuracy is greater than 70% in almost all cases. Also, the results were improved if the rough set measure was declared as equal to 0. 7. When only certain rules were taken into account, the overall recognition score became smaller. However, there is so far no clear indication for optimum system settings. Additionally, when compared to tests based on neural networks, the rough set-based system is not as efficient in

AUTOMATIC CLASSIFICATION OF MUSICAL INSlRUMENT SOUNDS 131

instrument classification tasks as NNs. However, it shou1d be remernbered tltat the recognition accuracy depends on the choice of the discretization method.

The next experiment aimed at testing various parameters. To begin with, comparisons were made between databases containing feature vectors consisting of 14 and 11 parameters (introduced in the previous section as: FOURTEEN and ELEVEN, correspondingly), with only time-related parameters included in the feature vectors in the latter case (as shown in Section 4.4). Then, polynomial- and cepstrum-based feature vectors were tested.

A set of resu1ts was obtained for different decision system setrings and different discretization methods. Exemplary recognition scores are contained in Tab. 4.25.

Tab. 4.25. Leave-one-out test results perforrned for musical databases (neutral point=0.6, rough set measure =0.6, rule length =3)

Database/Discretization EIWM VSQ STATCLUST (5 and 7 interva/s) I VI I 7 interva/s

FOURTEEN 53%/53% (IV1=4.8)/67% 67% ELEVEN 49%/56% <IVI=4.5)/72% 69% Polynomia/ 62%/68% <I VI =5.5) /79% 73% Cepstrum 48%/49% (IVI=3.8)/58% 56%

As may be seen in Tab. 4.25, the best recognition scores were obtained in the case of the VSQ discretization method, used together with the database containing parameters based on polynomial approximation. On the other hand, for the EIWM method with attribute ranges divided into 5 intervals, the results were only 53% accurate (overall), assuming the maximum rule length equals 3. Other, more general conclusions may also be drawn. Increasing the maximum rule length seems to improve the overall accuracy rate, and it also seems tllat an optimal number of range divisions exist in the case of the first two discretization methods. Additionally, it is obvious tllat rough set-based system settings might be also optimized.

The second objective of the experiment, aimed at checking the significance and number of parameters, is even more difficult to obtain. In the case of the database consisting of FOURTEEN feature vectors, the reduct contained the following six parameters: T2 and T3, describing the so-called Tristimulus parameters; P1 and P2,

describing the time and amplitude relations of the fundamental in the transient and steady-state pllases; and B (Brightness) and hoaa, depicting the energy relationships between the higher and the odd llarmonics in relation to the energy of the whole steady-state portion of a musical signal. The obtained reduct seems to be in good accordance with the acoustical point ofview. It is one ofthe basic assumptions of musical acoustics tllat sound timbre changes are perceived due to variations in the transient pllase. On the other hand, the parameter called Brightness describes the relation of the energy of the higher llarmonics to the energy of the whole steadystate portion of a signal. Obviously, this parameter is also significant from the

132 CHAPTER4

acoustical point of view because some musical instmment groups, such as brass, may be discemed on this basis. The last parameter included in the reduct is also characteristic for certain instruments (e.g. clarinet).

Another experiment was aimed at automatically assigning consecutive sound objects to a musical instrument register. This kind of test was performed for two instruments betonging to two different groups: the B flat clarinet(woodwinds group), and the contrabass (strings group). In the testing phase of the experiment, only three objects were chosen from the whole musical scale of each instrument, with each object representing differentiated registers.

Clarinet B flat:

1. low register- object No. 4, f3 (f=l75Hz), 2. middle register- object No. 23, c5 (f=525[Hz]), 3. highregister -objectNo. 34, b5 (f=99l[Hz]);

Contrabass:

1. low register- object No. 3, dl (f=36.628[Hz]), 2. middle register- object No. 20, g2 (f=97.566[Hz]), 3. highregister- object No. 34, a3 (f=219.403[Hz]).

The above sound objects were used in the training phase in order to generate rules, assuming the neutral point equals 0.6, the rough set measure equals 0.7 and the rule length did not exceed 3. EIWM quantization was performed using a division of the parameter domain into 3 intervals. During the testing phase, all remairring sound objects were used.

The results ofthe performed tests are given in Tab. 4.26 and Fig. 4.21 fortheB flat clarinet, andin Tab. 4.27 and Fig. 4.22 for the contrabass.

Tab. 4.26. Clarinet B flat- classification of sound objects to appropriate registers

Object 1 2 3 5 6 7 8 9

decision I I I I I I I I 100% 100% 100% 100% 100% 100% 100% 100%

Object 10 11 12 13 14 15 16 17

decision I I I I I I I I 100% 100% 100% 100% 100% 100% 100% 100%

18 19 20 21 22 24 25 26 27

m IIII 11 1/II IIII 11 11 11 11 100% 50% 75% 50% 50% 75% 100% 100% 75% 28 29 30 31 32 33 35 36 37

11 11 m m 11 m m m m 100% 100% 100% 100% 80% 100% 100% 100% 100%


The resu1ts shown in Tab. 4.26 are also illustrated in Fig. 4.21. As is seen in Fig. 4.21 consecutive sound objects were in most cases proper1y assigned to the particular registers.

The objects used in the learning phase are marked by the verticallines in Fig. 4.21, the tested objects are spread around these objects.

Tab. 4.27. Contrabass - classification of sound objects to appropriate registers

Object 1 2 4 5 6 7 8 9

decision I I 75% I I II II II II 75% 100% 100% 80% 80% 80% 100%

Object 10 11 12 13 14 15 16 17

decision II I ? I II I I I 100% 66.7% 66.7% 75% 66.7% 66.7% 100%

18 19 21 22 23 24 25 26 27 28

II II III III ? III II ? II ? 75% 66.7% 80% 100% 80% 100% 66.7% 29 30 31 32 33 35 36 37 38

III III IIIIII III III III ? III III 80% 66.7% 100% 50% 66.7% 100% 100% 100%

decision

"'"' I \I ... /\I V

\1\/ V ...

Jn. ,.....n.. .... "'"'

' z • • • w u « a a w n ~ u 21 30 31 34 36 _,

Cllject N:l.

Fig. 4.21. Clarinet B flat - classification of sound objects to appropriate registers

134 CHAP1ER4

decisian

2 .. 6 81012141618202l24262830l2:J43638

ObjectNo.

Fig. 4.22. Contrabass- classification of sounds to appropriate registers

When compared to the results obtained for the B flat clarinet, the classificati.on of contrabass object sounds (see Tab. 4.26 and Fig. 4.27) is not as precise and accurate as in the former case. The division between registers is difficult to be discemed. Moreover, some sound objects were not properly assigned to the particular registers.

5. AUTOMATIC RECOGNITION OF MUSICAL PHRASES

The automatic recognition of musical phrase patterns requires some preliminary stages, such as acquisition of musical fragments by means of a MIDI keyboard, storage ofMIDI-encoded m1,1sical phrases, MIDI data conversion, parametrization of musical phrases, and discretization of parameter values in the case of rulebased decision systems. These tasks result in the creation of musical phrase databases containing feature vectors. The recognition process includes training and classification phases. All of these phases are to be described on the basis of experiments performed by the author. The results shown here aim at underlining the most general problems which are met while analyzing a musical phrase, form or style.

5.1. Data Acquisition

The process of data acquisition involves playing a musical fragment on a MIDI keyboard while simultaneously sending data through a MIDIinterface to a UNIX workstation. For this purpose, a special interface was engineered at the Sound Engineering Department of the Technical University of Gdansk. The scheme of the acquisition process is presented in Fig. 5.1. Next, the processing of acquired MIDI data is tobe performed.

Fig. 5 .1. MIDI data acquisition process

136 CHAPTER5

5.1.1. Conversion of MIDI Data

All MIDI infonnation is based on the serial transmission of eight-bit packets of data, grouped together in bytes [57][68]. A MIDI command (representing an event) is usually composed of one, two or three such bytes sent one after another (Fig. 5.2). Basically, in each "note-on" message there are three separate data words. The first is the actual status command that says whether to play or release a note, along with the channel information needed for steering the command to the correct instrument. Then, the actual note is defined, and finally, the velocity value is given. Events connected to the "note-on" types of data are the most important and most often used [57][68].

note-on status byte 1

note number byte 2

velocity byte3

Fig. 5.2. The "note-on" command

There are two possible ways to decode the MIDI code. The first is to translate the data into the hexadecimal format In the HEX notation, information based on the "note-on" event may be interpreted as in the following example:

$90 $3C $38 $90 $3C $00

Thesefirst six bytes may be read as follows: $90 isaMIDI status byte for a "note-on" command, $3C corresponds to the note number, $38 is the velocity number; this three-byte sequence is then repeated, with the exception of the velocity value which is $00, denoting that the note is off. Additionally, in such a sequence a message about the time interval between two consecutive events may be found. The second way to decode the data is to translate the HEX format into ASCll notation.

In the experiments, the data in the MIDIfilesare stored in ASCII format. This requires the MIDIfilestobe converted into text files which are readable. In order to automate the conversion process from MIDI code-based information to ASCII data, a program for a UNIX-based workstation has been engineered [110][113][130]. A musical phrase, together with MIDI code translated into ASCII notation, is shown in Fig. 5.3.

a.

4t o; ) .FJ :oEJi A I ;nl J J J 3 J , • t

AliTOMATIC RECOGNITION OF MUSICAL PHRASES 137

b.

next> mftext 10 l.mid Reader format=l ntrks=2 division=I024 Trackstart Time=O Time signature=4/4 MIDI-clocks/click=24 32nd-notes/24-MIDI-clocks=8 Time=O Key signature, sharplflats=O minor=O Time=O Tempo, microseconds-per-MIDI-quarter-note=625000 Time=4096 Time signature=4/4 MIDI-clocks/click=24 32nd-notes/24-MIDIclocks=8 Time=4096 Key signature, sharp/flats=O minor=O Time=6400 Meta event, end oftrack Trackend Trackstart Time=512 Note on, chan=l pitch=60 vol=64 Time=1024 Note off, chan=l pitch=60 vol=O Time=l024 Note on, chan=l pitch=62 vol=64 Time=1536 Note off, chan=l pitch=62 vol=O

Time=6400 Meta event, end of track Trackend

Fig. 5.3. Musicalphrase (a) and corresponding MIDI code-based file translated into ASCIInotation (b).

The conversion process in the perfonned experiments was simplified to include only a single track (monophonic music fragments). As was mentioned, each event in the MIDIcode is represented by 2 or 3 bytes. However, some events arenot important for analyzing the melodic structure, for example modulation, pitch bending, program change, etc. The next step was then the elimination of such redundant information, leaving only "note-on"/"note-off' and pitch infonnation, the latter deduced from the MIDIinstrument key number.

5.1.2. Processing of MIDI Data

For the purpose of experiments on the recognition of musical patterns, a database containing fragments of §Ome musical pieces was created using the extracted MIDI data [110][113][118].

The database described below, named TOFFEE, consists of fugues by Johann Sebastian Bach (Ihemes Of fugues from "The Well-T!:mpered Clavi!:r" (in German: "Das Woh/t!_mperierte Klavi!_r"), selected since the fonn of these pieces is stylistically homogenous. The entire composition of a fugue is based on a theme, called the "subject", which is introduced as a solo melody. The subject of a fugue may be modified by one or more of the following devices: augmentation, in

138 CHAPTER5

which the subject is presented in Ionger notes; diminution, in which the note values are halved or quartered; inversion, in which the subject is sounded upside down; and retrograde, in which it is sounded backwards [202]. The most typical modification of a subject within a fugue, however, is a simple key transposition. Some of the modifications presented above were obseiVed when creating the database.

The construction of the MIDI database started from some exemplary classes [110], with 48 classes being de:fined on the whole. In all of the mentioned classes, besides the template (TEMPL) forms of fugues represented by a solo subject, 13 modified forms were included. Therefore, the main forms used in the created database were such as follows:

- template pattem (TEMPL), - fragment, containing the ominously made error when playing (ERROR), - fragment containing an additional note, before the actual fragment (PRE),

- fragment with a musical ornament ( ORN), - fragment with a pitch transposed (TRANS), - fragment played with a changed tempo - augmentation (A UG).

The remaining objects were acquired by applying two or three modifications on the template form at the same time. Objects in classes are grouped in the following way: reference pattem; form with an aceidentat mistake; form with an additional note before an actual pattern or note absent at the beginning of the actual pattern, the same as previously, but also with an omament; form with an ornament; five different transposed forms; form with a changed rhytlun (augmentation or diminution); transposed form with a mistake; transposed form with both a mistake and an ornament; transposed form with a changed rhythm.

Therefore, there are 14 objects in each class representing various modifications of the subject. Bach object in the database is a single MIDI file, and all of them are placed in one directory.

In Fig. 5.4, the template pattern together with selected modification forms are presented for excerpt No. 1 from Fugue No. 1. This figure provides a display of a musical phrase representation in which each line corresponds to a consecutive note (the horizontal axis represents note duration, and the vertical axis represents a note pitch in the chromatic scale).

Such modifications do not introduce changes to the perceived general melodii:: stmcture. It is due to the fact that human perception consists in recognizing inteiVals and melodies rather than recognizing single tone pitch. Musical melody is interpreted as a sequence of pitch transitions, the information on absolute pitch is disregarded as nonessential Also, changes in the time structure do not cause the increase in overall melody perception On the other hand, the consecutively decoded MIDI structures differ greatly from that of the original (template) form, mainly because the MIDI code structure depends on the number of events. In al1 of the modified forms, the length of the MIDI code stmcture is bigger in comparison to the template pattem.

AUTOMATie RECOGNITION OF MUSICAL PHRASES 139


140 CHAPTER5

f.

Fig. 5.4. Graphical presentation of fugue forms (Fugue No_l, excerpt No_l): (a) TEMPL form (the template pattern), (b) ERROR form, (c) PRE form, (d) ORN form, (e) TRANSform, (t)AUGform

5.2. Parametrization Process

Most expert systems use databases constructed of vectors with fixed length which represent objects. However, the objects in the created database originally contained a variable number of MIDI events. After converting a MIDIfile into ASCII code, the vector length is equal to the number of notes in the melody and resulting in a variable number of notes even within one class. There are at least two approaches for solving the problern of length incompatibility. The first approach is to find the maximum length of a vector and fix all other vectors in the database to its length simply by adding elements with zero. The major fault of this method is that it significantly ernarges the size of the database. This can also cause unexpected recognition results by possibly treating zeros as distinctive elements. The other approach is the data parametrization process. In this case, the musical phrase is no Ionger represented by consecutive MIDI-decoded values, hence on1y some distinctive features of the analyzed patterns will be provided.

Therefore, for the purpose of data size unification and reduction, the parametrization procedure was applied. Two parametrization methods were applied in this work: the first was statistical, and the other was based on a trigonometric approximation. However, before the parametrization procedure could take place, a time normalization procedure bad to be executed.

5.2.1. Time Normalization Methods

It should be noticed that for a melodic line, the only information included in the created database is the "note-on" data. This is written as a sequence of pairs of integers; the first representing the number of a note, the second the time of an event occurrence. Further reduction requires a time normalization process that should be applied to the musical templates [127].

The simplest normalization method involves expressing all note lengths with the same rhythmic measure. Therefore, to simplify the recognition problern with


the rhythm-related data, quantization to 32nd notes may be applied, as is seen in Fig. 5.5.

I~ II 'I

Fig. 5.5. Time normali:zation of a musical phrase

Since the problern of acoustic object identification, including time uncertainty, has been thoroughly exp1ored in the domain of speech acoustics, it seemed natural to make a survey for adequate methods within this domain. Consequently, a method for time normali:zation of utterances that has been proven to be a powerful tool in the speech recognition domain was successfully imp1emented for the purpose of time normali:zation of musical phrases. This method, ca11ed timewarping, can applied to both time normalization of pattems and to searching for event consequence within a phrase however the 1atter application will not be shown here [ 17 5].

The time-warping concept is presented schematically in Fig. 5.6 [175]. The time-warping algorithm provides a mapping between time indices of the reference and test pattems. The mapping is a function denoted as:

m = w(n) (5.1)

which must satisfy both: boundary conditions and continuity conditions. Boundary conditions concem the time alignment of the reference and test pattems, i.e.:

(5.2a)

(5.2b)

Continuity conditions allow the algorithm to deal with time domain nonlinearity, and can be imposed in many different ways, e.g.:

w(n + 1)- w(n) = 0,1,2 (w(n) 7= w(n -1)) (5.3a)

w(n + 1)- w(n) = 1,2 (w(n) = w(n -1)) (5.3b)

142 CHAP1ER5

Both specified conditions constrain the warping function (5.1) to be placed within a parallelogram, as shown in Fig. 5.7. The latter condition is needed in order to impose some limitations on the changes of the warping curve in cases where it is difficult to map the current sequence to the template pattern due to their dissimilarities. Specifically, this can be done by setting some boundaries such as inFig. 5.7.

The classical time-warping algorithm is presented in Fig. 5.8. The frrst assumption that is made when applying such an algorithm is in finding the relation between the reference and modified patterns. In the created database, there are two types of discontinuity between the reference and tested patterns. The first type of discontinuity is caused by the appearance of additional events (additional notes causing additional intervals) within the phrase. This type of discontinuity can be caused by the insertion of an ornarnent to the · phrase. The second type of discontinuity is caused by the absence of events.

T(m

m --~-~ ---------· ------- --·~-- --------·-

I I : I

! i I l

m = w(n~ ..... -Nr---· ·-·-i ·-·---····---------·-- ---~

I !

Fig. 5.6. Illustration ofthe time-warping concept

ReiJon of alloMible palhs

n N

N, n

Fig. 5. 7. Region of allowable paths of warping functions


Fig. 5.8. Time-warping algorithm

The formulated time-warping algorithm makes a series of comparisons between the template pattern and the current one. The criterion of the mapping is the minimum cumulative distance, calculated as the sum of all local distances Dr [175]:

N

Dr = min LD(R(n),T(w(n))) {w(n)} n=l

(5.4)

A local distance is calculated between the nth tested segment and the mth segment of the template pattern according to the expression:

DA(n,m) = D(n,m) +minDA(n -l,q) q~m

(5.5)

144 CHAPTER5

where DA (n,m) is the minimum accumulated distance to the point (n,m) and is ofthe form [175]:

n

D A(n,m) = LD(R(p),T(w(p))) (5.6) p=l

In Fig. 5.9 an example of musical phrase mapping is shown. The axes in Fig. 5. 9 represent rhythmical measure (horizontal axis) and note pitch expressed in MIDI-code notation (vertical axis), correspondingly. For a quarter note, the distance on the x-axis equals 1; for an eighth note, 0.5; a sixteenth note, 0.25; and for a 32nd note, 0.125. Rhythmical measures smaller than a 32nd note are omitted, as they have no influence on melody perception .

.. ii •• ... ...

61

"~-+--~--r--+----~ 0 2 3 4

quarter note axls

Fig. 5.9. Exemplary mapping function

5.2.2. Statistical Parametrization

One of the possible ways for parametrizing a musical phrase is to represent it by a feature vector containing some statistical parameters. In Fig. 5.10, a sequence of absolute and relative pitch (melodic interval between subsequent notes) values, the latter denoting intervals between subsequent notes, are presented. These values are derived from a musical pattern. On the basis of such a representation, statistica1 parameters are ca1culated.

64 66 68 69 71 69 68 73 66 71

absolute _/ 2 2 1 2 -2-1 5 ·7 5

73 71 69 68 relative 2 -2 -2 -1~ values

(intervals)

Fig. 5.10. Absolute and relative va1ues of pitches representing a musica1 phrase

AUTOMATIC RECOGNITION OF MUSICAL PHRASES 145

These parameters, listed and formalized below, describe some fundamental phrase features:

Pt - difference between the mean pitch of notes weighted to the phrase duration and the pitch ofthe lowest note:

n

~ =[t :~:>ktk]-~n(ak) k=l

where: T - represents a phrase duration, ak - kth note pitch, t1s- its duration, n - number of notes within a phrase;

P 2 - pitch range of the phrase:

P3 - mean absolute pitch change between consecutive notes: n-1

P3 =f-1 l)k -ak+tl k=l

P4- duration ofthe Iongest note:

P s - mean note duration:

(5.7)

(5.8)

(5.9)

(5.10)

(5.11)

The format of feature vectors based on the statistical parametrization method is shown in Tab. 5.1.

Tab. 5.1. Format offeature vectors derived from statistical parametrization

The statistical parametrization method allows for the representation of overall phrase characteristics. Moreover, the simplicity of calculations should be also mentioned. However, the feature vector consisting of statistical parameters does not describe the melody shape and the parameters are sensitive to tempo changes.

146 CHAPTER5

As a result of statistical parametrization, the initial TOFFEE database was modified and renamed TOFFEEST AT.

Examples ofparameter values from the TOFFEESTAT database are shown in Fig. 5.11. For presentation purposes, the number of objects in a class is reduced to 6 (the reference pattern, first from the left for every class, and five exemplary forms). As may be seen in Fig. 5.11, the differences between the values of parameter Pt for the reference pattern and its modifications are !arger than between the values ofparameter P3.

a.

16 14 12 10

8 6 4 2 0

b.

Class 1 Class 2 Class 3 Class 4

16 14 12 10

8 6 4

~···· Class 1 Class 2 Class 3 Class 4

Fig. 5 .11. Examples of statistical parameter values for four randomly chosen classes ofthe TOFFEESTAT database: a) parameter Pt, b) parameter P3 .

Also, some simple statistical measures, such as mean (X ) and variance (S) values, were calculated for the TOFFBEST AT database:

where: X; - ith parameter value, n - number of parameters

(5.12)

(5.13)

Additionally, maximum (M) and minimum (m) parameter values were calculated:

M =max(X) n

m =min(X) n

(5.14)

(5.15)

Mean values of the computed statistical parameters for all classes of the TOFFEESTAT database are shown inFig. 5.12.

AUTOMATIC RECOGNITION OF MUSICAL PHRASES

a.

20

15

X 10

5

0

b.

20

15

X 10

5

0

c.

20

15

X 10

5

0

d.

20

15

X 10

5

0

e.

20

15

X 10

5

0

4 7 10 13 16 19 22 25 28 31 34 37 40 43 46

No. of c lass

4 7 10 13 16 19 22 25 28 31 34 37 40 43 46

No. of class

L.~ 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46

No. of class

ltin.tuJJ&.Jh.Jb.JJL 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46

L 4

No. of class

,_. -•-·-·-·· _. _____ ._r,l. •• ·-- ·-BI-I lVI I NWPl 19M Wi4 PPPI PM9 PlG I 191 AM 14i9 PI

7 10 13 16 19 22 25 28 31 34 37 40 43 46

No. of clus

147

Fig. 5.12. Mean values of P1 (a), P2 (b), P3 (c), P4 (d) and Ps (e) parameters for all classes from the TOFFBEST AT database

148 CHAPTER5

In Tab. 5.2 a comparison of mean, variance, maximum and minimum values for the TOFFEESTAT database is shown.

Tab. 5.2. Mean, variance, maximwn and minimwn values for the TOFFEESTAT database

pl p2 p3 p4 Ps X 10,076 11,006 2,765 5,899 2,489

$' 7,989 7,164 0,694 15,796 2,256

M 20,300 19,000 5,6882 4,000 8,500

m 1,778 4,000 1,478 2,000 0,737

Similarly, the same measures were calculated for template patterns (see Tab. 5.3).

Tab. 5.3. Mean, variance, maximwn and minimwn values for the TOFFEESTAT database ( only template pattems)

pl p2 p3 p4 Ps X 10,204 10,833 2,806 5,458 2,425

$' 7,926 7,306 0,775 12,707 2,426

M 18,110 17,000 5,636 16,000 8,333

m 4,333 4,000 1,579 2,000 1,028

As may be seen from these tables, mean values for the parameter pairs P1. P2

and P3, P5 are very close to each other, and at the sametime are very close to the values obtained for the template pattems. In contrast, the P 4 parameter better ensures class separation. In Sections 5.3 and 5. 4 the extracted parameters will be tested further on by means of learning algorithms, thus their quality. will be evaluated.

5.2.3. Trigonometrie Approximation of Musical Phrases

The trigonometric parametrization method has been developed in order to reduce the disadvantages of the statistical one. It is woith noticing that a score notation may be treated as a kind of signal time-domain plot (see Fig. 5.13). Therefore, the shape of a musical phrase melody is analyzed as a series of cosine waveforms. Consequently, a trigonometric transform based on . the following equation was applied to the analysis of such plots:

(5.16)

where:


T; - ith element of parameter vector, I - length of a parametrized phrase, ek -

kth event in a phrase.

The accuracy of the analysis performed by applying a series of cosine waveforms can be controlled by the length of the series. The length of the series, in the case of these experiments, equals the number of parameters representing a phrase. Having an adjustable number of parameters allows compromising between the resolution and computing power requirements of an expert system.

Fig. 5.13. Graphical presentation ofthe orthogonal approximation of a musical phrase

By analyzing this type of parametrization, it is possib1e to derive the following conclusions: the orthogonal parametrization represents the melody shape; it is transposition and tempo independent; it allows for user-definable length of the parameter vector. On the other hand, the rhythmical structure is no Ionger represented in the feature vector.

Depending on the number of orthogonal parameters computed according to the formula (5.17), four new databases, called TOFFEETR3, TOFFEETR5, TOFFEETR7 and TOFFEETRIO were created. Hence, the length of feature vectors is equal to 3 in the case of the TOFFEEfR3 database, to 5 in the case of the TOFFEETR5 database, etc. The format of feature vectors for the TOFFEETR3 database is shown in Tab. 5.4.

Tab. 5.4. Feature vectors for the TOFFEETR3 database

Examples of the trigonometric parametrization-based parameter values are shown in Fig. 5.14. For presentation purposes, the number of objects in a class is reduced to 6 (the reference pattern, first from the left for every class, and five modified forms).

a. b.

8 6

6 4 4 2 2 0 0

-2 -2 ... ... 1 -II 1

Fig. 5.14. Examples of values of orthogonal parameters for four randomly chosen classes from the TOFFEETR databases: a. parameter T3, b. parameter T5

150 CHAPTER5

Additionally, some statistical measures were calculated for the whole TOFFEETRIO database and, for comparison, for the template pattems (Tab. 5.5, 5.6). Also, mean values for some chosen parameters are shown in Fig. 5.15.

Tab. 5.5. Mean, variance, maximwn andminimwn values forthe TOFFEETRIO database

T1 T2 T3 T4 T5 X -0,18627 -0,15410 -0,01933 -0,14954 -0,01057

S' 1,00528 0,70139 0,34397 0,38014 0,23681

M 2,74555 1,76279 1,75069 2,23898 1,28807

m -2,46986 -2,04074 -1,95094 -1,73278 -1,31598

T6 T1 Ta T9 TJO

X 0,00347 0,00056 -0,05728 0,06124 -0,02290

s 0,27810 0,17270 0,16721 0,15878 0,10045

M 1,43705 1,32462 1,12755 1,26282 1,28385

m -1,49873 -0,92379 -1,04172 -0,82553 -0,89595

Tab. 5.6. Mean, variance, maximwn and minimwn values for the TOFFEETRIO database (only template pattems)

T1 T2 T3 T4 T5

X -0,18867 -0,15062 -0,02146 -0,15264 -0,00906

s 0,97568 0,69172 0,33503 0,38378 0,22658

M 1,75591 1,68764 0,96787 1,78082 1,01540

m -1,84778 -1,96120 -1,84828 -1,56378 -1 ,02826

T6 T1 Ta T9 TJO

X 0,00894 0,00906 -0,05103 0,06109 -0,02126

S' 0,27034 0,17182 0,16061 0,16544 0,09054

M 1,16397 1,03545 0,86015 1,21012 1,07563

m -1,12570 -0,81697 -0,86121 -0,75746 -0,72316

a.

4 7 10 13 16 19 22 25 28 31 34 37 40 43 46

No. of class

AUfOMATIC RECOGNITION OF MUSICAL PHRASES

b.

c.

X 0~~~~~~~~~~~~~~~.,~~~~~

-1 ~~----~----~~--~~~ .. ~~-----------

.. 7 10 13 16 19 22 25 28 31 34 37 40 43 46

No. ot class

·-~ r, .. ,~ ................... ....,,-t",.....:J -2~-------------------------------------------

.. 7 10 13 16 19 22 25 28 31 34 37 40 43 46

No . ot class

151

Fig. 5.15. Mean values of T1 (a), T2 (b) and T1o (c) parameters from the TOFFEETRI 0 database

Since consecutive mean values "follow" the shape of melody, hence this measure is not very useful in the case of analysis of the trigonometric-based parameters.

Data Similarity Testing

Additionally, in data testing, such measures as correlation and corresponding Student's t values were computed for the TOFFEESTAT and TOFFEETR7 databases (classes 16 and 23).

Values of calculated correlation r (see Eq. (3.99)) and the corresponding Student's test t (Eq. (3.102)) values are shown respectively in Tab. 5.7-5.10 for the TOFFEESTAT database. The next tables (Tab. 5.11-5.14) contain similar computational results for the TOFFEETR7 database.

Tab. 5.7. Correlation coefficients r calculated for the TOFFEESTAT database (class No. 16)

r T1 T2 T3 T4 Ts TI 1.000

T2 0.839 1.000

T3 -0.461 -0.025 1.000

T4 -0.598 -0.141 0.962 1.000

Ts -0.901 -0.600 0.791 0.860 1.000

152 CHAPTER5

Tab. 5.8. Student's statistics t calculated for the TOFFEESTAT database (class No. 16)

t T1 T2 T3 T4 T5 T1 --T2 5.341 --T3 -1.798 -0.085 ---T4 -2.587 -0.493 12.240 -----T5 -7.174 -2.599 4.475 5.835 ----

Tab. 5.9. Corre1ation coefficients r calculated for the TOFFEESTAT database (class No. 23)

r T1 T2 T3 T4 T5 T1 1.000

T2 0.829 1.000

T3 0.772 0.978 1.000

T4 0.950 0.924 0.915 1.000

T5 0.960 0.660 0.574 0.843 1.000

Tab. 5.10. Student's statistics t calculated for the TOFFEESTAT database ( class No. 23).

t T1 T2 T3 T4 T5 T1 --T2 5.136 ----T3 4.213 16.282 -----T4 10.548 8.360 7.835 -----T5 11.909 3.040 2.429 5.437 -----

Tab. 5.11. Correlation coefficients r calculated for the TOFFEETR7 database (class No. 16)

r T1 T2 T3 T4 T5 T6 T7 Tl 1.000 T2 0.839 1.000 T3 -0.461 -0.025 1.000 T4 -0.598 -0.141 0.962 1.000 T5 -0.901 -0.600 0.791 0.860 1.000

T6 -0.981 -0.823 0.553 0.668 0.943 1.000

T7 -0.861 -0.965 -0.029 0.141 0.579 0.810 1.000

Tab. 5.12. Student's statistics t calculated for the TOFFEETR7 database (class No. 16)

t T1 T2 T3 T4 T5 T6 T7 T1 -----T2 5.341 ---T3 -1.798 -0.085 --T4 -2.587 -0.493 12.240 ---T5 -7.174 -2.599 4.475 5.835 --

AliTOMATIC RECOGNITION OF MUSICAL PHRASES 153

Tab. 5.13 Correlation coefficients r calculated for the TOFFEETR7 database ( class No. 23)

r T1 T2 T3 T4 T5 T6 T7 T1 1.000

T2 0.829 1.000

T3 0.772 0.978 1.000

T4 0.950 0.924 0.915 1.000

T5 0.960 0.660 0.574 0.843 1.000

T6 0.906 0.664 0.529 0.753 0.946 1.000

T7 0.911 0.930 0.894 0.919 0.777 0.786 1.000

Tab. 5.14. Student's statistics t calculated for the TOFFEETR7 database (class No. 23)

t T1 T2 T3 T4 T5 T6 T7 T1 --T2 5.136 ----T3 4.213 16.282 -----T4 10.548 8.360 7.835 -----T5 11.909 3.040 2.429 5.437 ----T6 7.398 3.075 2.159 3.964 10.119 ----T7 7.665 8.786 6.902 8.072 4.281 4.400 -----

lt was found for the TOFFEESTAT database that pairs of parameters for which correlation values exceed 0.7 (values of r with their corresponding t statistics are highlighted in bold in Tab. 5.7-5.10) are strongly correlated (at significance Ievels of either 0.01 or 0.05). In the case of the TOFFEETR7 database, values of r and their corresponding t statistics are also highlighted in bold where there is a strong correlation between pairs of parameters (Tab. 5.11-5.14). As may be seen, the correlation technique is very valuable when dealing with the problern of data similarity. However, correlations between parameters do not remain stable within the whole data set. As is seen in the case of both analyzed databases (TOFFEESTAT and TOFFEETR7), the highest value of correlation r was obtained forapair ofparameters TrT3 for the class No. 23. Contrarily, the same pair of parameters was characterized by the lowest value of correlation r in the case of class No. 16.

Parametrization methods that were used in the experiments carried out by the author aimed at data compression and at the same time at preserving the relationships between the melody components.

154 CHAPTER5

5.2.4. Separability of Parameter Values

In order to test the distinction between parameter values, the maximum absolute values of Behrens-Fisher statistics were calculated for each pair of classes represented in the TOFFEESTAT, TOFFEETR3, TOFFEETR5, TOFFEETR7 and TOFFEETRIO databases. Then, for each database, a table containing the maximum absolute value of the Behrens-Fisher statistics for each pair of classes along with the index of the related parameter was created. A portion of such a tableis shown in Fig 5.16.

On the basis of the obtained results for the TOFFEESTAT database, one may say that the maximum value of lVI wasmostfrequent for the parameter P3 (539 times) and, contrarily, was least frequent for the parameter Ps (26 times).

Additionally, minimum and mean absolute values of the Behrens-Fisher statistics (I Vj) have been calculated for the TOFFBEST AT database before and after the discretization. These values are presented in Tab. 5.15 and Tab. 5.16. A comparison between the values of particular parameters for two chosen classes (classes 16 and 23 ofthe TOFFEESTAT database) is shown in Fig. 5.17a for nondiscretized parameter values, and in Fig. 5.17(b,c,d) for parameter values after discretization. Parameter values for a particular class are not always concentrated in separable sets, as is shown in Fig. 5.17. On the other hand, better discernibility of objects belonging to a particular class of musical phrases is indicated by a larger absolute value of the Behrens-Fisher statistic after discretization than is the case with non-discretized values.

The same kind of analysis was performed for the trigonometric databases. In Tab. 5.17, the minimum, mean and maximum IVl values calculated for the TOFFEETR databases are shown. The minimum IVl value equals 1.52, and was obtained for classes 4 and 33 ofthe TOFFEETR3 database.

Tab. 5.15. Minimum Behrens-Fisher statistic values for the TOFFEESTAT database (parameter indexes in parentheses)

TOFFEESTAT EIWM VSQ MGCM (before discretization)

mi~Vi 1.52 (4, 33) 1.44(18,20) 1.45 (5, 35) 1.15 (2, 20)

Tab. 5.16. Mean Behrens-Fisher statistic absolute values for the TOFFEESTAT database

TOFFEESTAT EIWM VSQ MGCM (before discretization)

r1 16.72 15,14 15.69 14.52

Tab. 5.17. The mini VI, mean lVI and maxi VI values for the TOFFEE1R databases

TOFFEETR3 TOFFEE1R5 TOFFEE1R7 TOFFEE1R10

mini VI 1,58 3,93 3,93 4,6

lVI 29,49 31,74 33,83 35,26

maxi VI 152,86 152,86 152,86 152,86


index of a parameter for which maximwn Behrens-Fisher th B hr F. h tatisti e e ens- IS er s es

~ statistics absolute valu absolute value is the biggest one ~

e

Classes 1 2 3 4 ~7 6 7 ... 1 2.89 34.80 11.04 1~9 4.00 5.89 ... 2 2 36.34 16.82 7.1\ 5.34 7.71 ... ... ... ... . .. . .. \ ... . .. . .. 6 3 2 3 2 1 \ 10.21 ... 7 3 3 3 2 3 t3 ... ... ... ... ... . .. . .. ... . ..

Fig. 5.16. Presentation ofthe maximum Behrens-Fisher statistic values for each pair of c1asses, together with the parameter index associated with the obtained maximum Behrens-Fisher statistic value

a. b. P'tiiW'MtM'Ho.2 .. .,..",...,Hti, Z

··l ···· + r- ·-r ·-··r- · ··l· l· ~i · -T T· --~ • -r ... -. .r. i. -- • ----·---r __ ,.i ... -1- .... r... .. -·+· ... r .......... f .. 1 .... i · .... f ....... I i.... ····· ·· ... f. ·'- • ····i· .,. ··11 ..... ,' ....... 1l .... ·1: •... , ...

I I I I I I ' I , I ' ' ... ..... !'.' .. . ·1 ·r .. ··~---~- ·r .. 1·--t ...... ,.t-···j· ·t · ' · ... (l .... , ...... j ..... : ...... ; ...... : ......... i. · i .. -t---·l·-·l·······i···+··l ........ 1' ..... j'_ .... j_ ..... 1_ .. . 11 ..... ,1 ........... -: I , I I I I I 1 r· ...... r········t··· .... t ........ t ·····t···· .... ~ ..... i ..

·! · t ··! f ., ·· ~ + .. r ·+· · i· ·+ ·! .. +.l· f ··· ···~ .. '·· ·'1,1······· 11· ....... 1! ··f~-.···1.···· ... .11 .. . . . . . . . . . . i i i f . .. ~,, .......... ~ .... ·---;~:· ....... i,· ...... L ........ t~·-· .... .i.,- .... .i • ... -· .. t·· ...... :-·-...... t.. .· ·.·_~, .. ·.· .. ·.·.·_ ..... ,,. ......... ,,. .. ' .... - ~. . ...... ! .... ···r.········-·!.! ........ L_j ......... i_i··-~--...... 1 •. . .... ........... ' ......... ,.. ........... ............ .. ..... .!,_ ..... ..J. . -,'_· . . r· 1 . I ., .... , ... r· .. ? ........ ~ .... 1 .... ~ . .... ! ...... · .. !·.· .. ·.· .. · .• '.·.·.

• -1 f. . .. ~ ••. j .......... ! ....... I "" i·· . 't·· ··t " .. J ...... t .•.. ] i I ..... .!. i '

-1-· . ; "'"1' -T· · ,;. "+ .• i ···--i---- --i ··

I ' ;-g::=~: ; '-·- ·····~-~-~=·::::~ .. ,.;.;.;,...,·NcUI ,.,, ' ' .:( .: "'1" '"I' : ""'t I''" t P'~Nz:~ ··-, r ' j

c. d. ~IIIWMtUNO, 2 11'-W'MMWN<I. 2

. : ..... ; ..... j ....... ; .. .... f ...... ; f-...... l ?--r -------+-------..r--- ----r-- ...... t. -----l ------1- ------f · -- ·-T------- ~-t--- ------l-1 i I I ! ... ; .. +- ···l .. -!-······+ .. ·-i-- ·····+

~ 1 ... ~ .. -· j... i. '"'J"" i"""'l . "1 .... t i L .... , .......... L

• ! .. :--· t· ··' ·····-: .. r ... ;. !··-·~·· ·: ... r • + ··-0 ···· -y: ·······-;·······---r-·-···· ·r ------1---······t···- ··t··· ····+··

.... .1.. . •· ··-·· -~ i i l.i . . .... .~.. ·- -- -.~-- .. 1.' .. I . ' I

I ' I i m- t 1 ....... " t· , i •· · r ··· r .. r·· ·· 1""' ·r-a j · "t

, .. ·D '+ ' ' ' I : ... .i 1· ... ·:· : • '· .. · .... .. .. "' · · :.... ! · .. · · ; ····· :--.... · r· •. , ...... '?·.. . r· ... j..... . i ·f .... , ................... , .......... ; ...... +-

.! ... i ....... J ... J j'''~ .. t .. , .... 9 .. , ~ --.. ·l · + ....... ~ ...... j .. i ..... j ..... .~. --l · • I.

! ·r .. .......

P~No;J

1 i i i § , ~ ~ ~ 1 [ ~ 1 ! i l r .... , ... , . .... ··r··T· .. ·r.· .•. , .... f .... , ..... , ........... : ......... t ......... 1 ........ , ........ . i ......... i .......... i ..

. ~; ~~~- . t ... - --~ ! ! -·--·· _; -------~;~---··---.L ....... J ........ t ......... i,. l ··· -> ········t· ,.~.:ht~·--····! i I I l i

Fig. 5.17. Graphical presentation ofparticular parameter values for two chosen c1asses (class No. 16 and class No. 23 ofthe TOFFEESTAT database) in the case of non-discretized parameter values (a), and after discretization with the EIWM (b), VSQ (c), and MGCM methods (d). The maximum Behrens-Fisher statistic value in the case of non-discretized parameters equa1s 3.02

156 CHAPTER5

In addition, it should be noticed that the smallest values for IV1 were obtained for the database containing three parameters (TOFFEETR3), and the largest ones for the database consisting of 10 parameters (TOFFEETR10). Hence, it is obvious that enlarging the length of the feature vectors in the case of trigonometric databases increases the separability of classes. This is because the larger the number of parameters, the more detailed the description of the melody shape and at the same time the better the differentiation of form pattems within classes. In Fig. 5.18, pairs of classes from the whole TOFFEETR database are presented that are characterized by the biggest IV1 value. It may be seen from Fig. 5.18 that the presented parameter values are contained in separable sets.

a. b. _.., ... _.., .... ... ... .. ... . .

•• ,. ; ...•....•... ;.; +···············"········~···················<·+

•.• , + !··············+ +··············+ +··············+·+

•• , + ·'·················+········· +···············+·····"················'· '

••!·+ +················' +·············~······· +·············•···········+

c. d . ... -4.11 '·· ->.0 .... ... ... . .

.

... ............ . ..............

.

-·· ,.

-· -·· !

; ~:lfR: 1::::·~ · u:::::H· ... Fig. 5 .18. Presentation of classes characterized by the biggest lVI values in the

case of: TOFFFEETR3 (a), TOFFFEETR5 (b), TOFFFEETR7 (c), TOFFFEETRI 0 ( d) databases

5.3. Neural Network as a Classifier of Musical Phrases

For analysis purposes, two reduced databases (one from the statistical database TOFFEESTAT and the other from the trigonometric one, TOFFEETR5) containing objects of 12 classes were constructed. Both databases consisted of five


element feature vectors. A multi-layer neural network of the foedforward type was used in the experiments. The number of neurons in the input layer was equal to the number of elements of the input vector, hence it was equal to 5. In turn, each neuron in the output layer was matched to a different class of the musical phrase and so their number was equal to 12. The number of hidden neurons was arbitrarily adopted at 15. The first stage of the experiments encompassed the training phase for the neural network. In the frrst test, the training phase was performed using all objects except the reference pattems, and the reference pattems were subsequently used for testing (scheme 1). Table 5.18 illustrates the quality of recognition versus error value for both databases. In the second experiment, training was performed on all data except the pattems representing the forms with a modified first note and with an omament. The speci:fied pattems were then used in the testing phase ( scheme 2). Table 5.19 illustrates the quality of recognition versus error value for both databases. The third test consisted of training on the reference pattems and testing on all of the remaining forms (scheme 3). Table 5.20 illustrates the quality of recognition versus error value for both databases.

The training phase proceeded differently for the statistical and trigonometric databases, with the training proceeding quickly and without interference for all training schemes in the case of the statistical database. The network training for the trigonometric database, however, proceeded very slowly, especially in the fust training scheme. Increasing the accuracy required additional tens of thousands iterations, executed at the expense of convergence speed. It may be observed that the network trained with the trigonometric database parameters demonstrated the best results for the third scheme of training.

Tab. 5.18. Quality ofrecognition [%] versus error value (scheme 1)

Error value/Database 10 1 0.1 0.01 TOFFEESTAT 83.33 83.33 91.67 91.67 TOFFEETR5 91.67 100 100 100


Error value/Database 10 1 0.1 0.01 TOFFEESTAT 56.3 66.67 72.5 73.6 TOFFEETR5 75 83.33 83.33 83.33


Error value/Database 10 1 0.1 0.01 TOFFEESTAT 83.33 85.25 86.36 86.36 TOFFEETR5 94.67 95.24 95.24 95.24

The results of the experiments show high effectiveness in the classification of musical pattems by neural networks. The obvious advantage of the neural network

158 CHAPTER5

type of classifier is the fact that there is no need for quantization of parameter values included in the feature vector.

5.4. Rough Set-Based Classification of Musical Phrases

5.4.1. Parameter Discretization

Since both the discretization process and a description of the rough set -based system have already been illustrated in the preceding chapters, only an overview of the used methods is presented below.

A rough set-based decision system is more efficient when using quantized values, thus requiring discretization of real data into the integer domain. For the created databases, three methods of discretization have been used by the author: Equal Interval Width Method (EIWM), Variable Statistical Quantization (VSQ) and Maximum Gap Clusterization Method (MGCM) (see Section 3.4). The frrst two types represent the so-called quantization methods, and involve the determination of points where parameter values divide the domain into ranges. The latter type belongs to the clusterization method category, and consists of gathering parameter values together and forming intervals.


Tests on the recognition of musical phrases were performed using both statistical and trigonometric parametrization-based data (see Fig. 5.19).

STATJ.STICAL TRIGONOMETRIC

Fig. 5.19. TOFFEE database transformation

Aseries ofexperiments using the TOFFEESTAT, TOFFEETR3, TOFFEETR5, TOFFEETR7 and TOFFEETRIO databases and various settings of the rough set learning system were performed. During the process of training, the following system settings were applied: maximum rough set measure (jJ,,) equals 1


(al1owing certain rules only); minimum rough set measure (p,.) equals 0.5 (using both certain and possible ru1es); neutral points ranging from 0.2 to 0.9, consecutive1y, in 0.1 increments. Four Ievels of discretization were applied in the experiments: 7, 10, 15 and 20. The 1eave-one-out procedure was used as a c1assification process.

This stage of experiments involves the testing of reduced databases (using the TOFFEESTAT statistical and TOFFEETR trigonometric databases) containing objects from 12 classes (see Tab. 5.21-5.24). In this case, the overal1 recognition accuracy was less than that obtained in experiments using neural networks.

Tab. 5.21. Recognition scores [%), EIWM. Rough-setsystem parameters: min. f.JRS equals 0.5 (using possible rules); neutral point from 0.2 to 0.9 (0.1 increments); certain rules, ,uR?l. TOFFEESTAT database

Neutral 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 certain point/ rules Numberof intervals 7 75.60 76.19 77.98 78.52 79.17 78.57 30.36 8.33 77.98 10 82.14 82.74 83.93 83.33 80.95 77.98 56.55 5.36 84.52 15 81.55 82.74 80.95 80.95 79.76 76.79 57.74 16.67 84.52 20 84.52 85.71 86.31 85.12 84.52 83.33 72.02 30.95 84.52

Tab. 5.22. Recognition scores (%), MGCM. Rough-setsystem parameters: min. /-lRS equals 0.5 (using possible rules); neutral point from 0.2 to 0.9 (0.1 increments); certain rules, f.JR?l. TOFFEESTAT database


Tab. 5.23. Recognition scores [%), EIWM (7 intervals). Rough-setsystem parameters: min. /-lRS equals 0.5 (using possible rules); neutral point from 0.2 to 0.9 (0.1 increments); certain rules, pJI?l. TOFFEETR databases

Neutral point/ 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 certain Database rules TOFFEETR3 81.55 82.14 82.14 82.74 83.33 80.95 75.00 39.88 61.69 TOFFEETR5 85.71 86.90 86.90 86.90 86.32 86.31 80.95 35.12 88.10

TOFFEETR7 84.52 84.52 84.52 85.12 85.12 84.52 75.00 28.57 86.31 TOFFEETRIO 85.21 85.92 87.01 86.70 85.14 85.34 74.71 27.31 87.01

160 CHAPTER5

Tab. 5.24. Recognition scores [%], EIWM (10 intervals). Rough-set system parameters: min. J.lRS equals 0.5 (using possible rules); neutral point from 0.2 to 0.9 (0.1 increments); certain rules, PR? 1. TOFFEETR databases

Neu1ral 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 certain point/Database rules TOFFEETR3 82.74 82.74 82.74 82.14 82.74 76.79 78.57 47.62 79.17 TOFFEETR5 85.12 85.12 85.71 85.71 86.90 87.50 81.55 45.83 90.48 TOFFEETR7 89.29 89.29 89.29 89.99 89.29 88.69 82.14 46.44 92.86 TOFFEETRIO 88.31 86.71 87.88 91.01 89.11 87.34 81.12 45.90 92.17

In the next set of experiments, whole databases were engaged in the recognition testing. In Tab. 5.25-5.27, the results obtained for the whole TOFFEESTAT database are shown.

Tab. 5.25. Recognition scores [%], EIWM. Rough-setsystem parameters: min. p.,., equals 0.5 (using possible rules). TOFFEESTAT database


Tab. 5.26. Recognition scores [%], VSQ. Rough-setsystem parameters: min. p.,., equals 0.5 (using possible rules). TOFFEESTAT database

Neu1ral 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 certain point/ rules Numberof intervals 7 51.04 51.49 51.93 50.89 43.90 32.74 15.33 7.74 34.82 10 60.42 60.71 62.05 62.05 56.55 43.90 26.19 10.57 52.83 15 62.05 63.69 63.69 64.88 63.24 52.98 32.14 10.71 64.43 20 62.95 64.88 65.92 65.63 62.95 53.42 34.52 14.14 65.03

Tab. 5.27. Recognition scores [%], MGCM. Rough-setsystem parameters: min. Jl.n equals 0.5 (using possible rules). TOFFEESTAT database

Neu1ral 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 certain point/ rules Numberof intervals 7 52.68 53.27 53.42 53.57 52.23 26.79 16.52 7.29 33.78

10 78.20 79.21 79.51 80.31 78.67 64.82 34.36 5.41 72.28

15 71.73 72.17 72.77 73.36 72.02 64.58 39.14 9.52 74.11

20 72.47 73.21 73.66 74.85 72.62 67.86 42.71 10.57 76.34


Notice the influence of both the quantization method and the number of discretization intervals on the recognition scores. The best overall scores were obtained for the EIWM method with quantization into 20 intervals. The worse recognition classification came consistently when discretization was into 7 intervals. This was mostly caused by the small differentiation between statistical parameter values, hence more dense discretization brought better results. In addition, the decision system settings also influenced recognition scores, with neutral point settings of 0. 8 and 0. 9 especially bringing minimum scores.

Because the best result (80.31 %) was acquired using the MGCM quantization method with 10 intervals, these settings were therefore used to test the influence of the rough set measure (/.4?s) on the recognition score. As may be seen from Tab. 5.28, it is possible to find optimal system settings and thereby obtain better recognition scores.

Tab. 5.28. Results of the musical phrases recognition process [%], MGCM. Rough-set system parameters: min. IJRS from 0.3 to 0.8 (using possible rules); neutral point from 0.2 to 0.8. TOFFEESTAT database

Neutral point/ 0.2 0.4 0.5 0.6 0.8

IJRS 0.3 76.88 78.83 72.34 50.48 4.29 0.6 79.13 79.92 86.71 81.66 58.34 0.8 77.79 77.79 78.31 77.96 79.49

Examples of test results using the TOFFEETR databases and the rough setbased system are presented in the tables below (Tab. 5.29-5.32).

Tab. 5.29. Results of the musical phrases recognition process [%], EIWM (7 intervals). Rough-setsystem parameters: min. f.lrs equals 0.5 (using possible rules); neutral point from 0.2 to 0.9 (0.1 increments); certainrules, JJR?l. TOFFEETR databases

Neutral point/ 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 certain Database rules TOFFEETR3 55.36 55.51 55.51 53.87 44.20 36.01 24.26 16.07 17.86 TOFFEETR5 80.51 80.51 80.65 78.57 75.30 71.28 50.30 8.63 72.77 TOFFEETR7 85.57 85.86 86.01 75.71 83.96 81.7 66.67 3.13 84.23 TOFFEETRIO 84.32 85.41 86.12 77.78 84.11 80.32 61.34 12.01 83.11

As may be seen from Tab. 5.29, the recognition score depends on the system settings. The scores are greater than 80% when the neutral point is in the range 0.3 to 0.7 (except ofthe TOFFEETR3 database, which apparently does not ensure the appropriate description of the musical pattem). The lowest recognition score was obtained when the neutralpointwas equal to 0.9.

162 CHAPTER5

Tab. 5.30. Results of the musical phrases recognition process [%], VSQ (7 intervals). Rough-set system parameters: min. IJRs equals 0.5 (using possible rules); neutral point from 0.2 to 0.9 (0.1 increments); certainrules, JIR?l. TOFFEETR databases

Neutral point/ 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 certain Database rules TOFFEETR3 64.73 64.88 65.03 64.14 57.89 53.42 35.86 20.98 32.14 TOFFEETR5 83.33 83.33 83.33 83.63 81.99 80.06 62.20 15.48 79.32 TOFFEETR7 88.10 88.10 87.80 87.65 87.50 87.05 73.36 I0.71 81.12 TOFFEETRIO 87.32 89.11 89.56 87.34 87.I2 84.98 71.56 12.94 82.I8

Looking at Tab1es 5.29 and 5.30, it is possib1e to compare the influence of the quantization method on the recognition score whi1e assuming the sarne system settings. The recognition score when using the VSQ method was better than when the EIWM method was applied. Additionally, it may be seen that in the case of the TOFFEETR databases (Tab. 5.31 and 5.32), the recognition accuracy was strongly influenced by the quantization order (number of division intervals) and the nwnber of parameters in the feature vectors. However, increasing the number of intervals above 10 does not cause changes in the overall c1assification accuracy.

Tab. 5.31. Recognition scores [%], EIWM (IO intervals). Rough-set system parameters: min. J.lr, equals 0.5 (using possible rules). TOFFEETR databases

Neutral point/ 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 certain Database rules TOFFEETR3 75.00 75.89 76.04 75.60 73.51 66.67 49.70 22.62 52.23 TOFFEETR5 84.23 84.38 84.38 84.67 84.82 82.59 66.82 14.43 84.67 TOFFEETR7 86.76 86.76 86.76 86.76 86.76 86.90 78.87 I2.06 88.24 TOFFEETRIO 85.7I 86.0I 86.46 86.46 86.I6 85.7I 81.38 13.75 85.I2

Tab. 5.32. Recognition scores [%], VSQ (IO intervals). Rough-setsystem parameters: min. J.lr, equals 0.5 (using possible rules). TOFFEETR databases

Neutral point/ 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 certain Database rules TOFFEETR3 82.74 82.74 82.74 82.14 82.74 76.79 78.57 47.62 79.17 TOFFEETR5 85.12 85.12 85.7I 85.7I 86.90 87.50 81.55 45.83 90.48

TOFFEETR7 89.29 89.29 89.29 89.99 89.29 86.9 82.14 46.44 92.86 TOFFEETRIO 84.52 85.12 85.12 85.I2 84.52 82.74 79.76 57.74 88.10

When comparing the results for all of the databases, better scores were obtained using parameters based on the trigonometric parametrization than on the statistical ones.

Some additional tests were performed which consisted of using all modified forms from the TOFFEESTAT database in the 1earning phase, and then testing on the temp1ate patterns. Results for one suchtest are inc1uded in Tab. 5.33. As may be seen from the tab1e, the recognition score increased in comparison to the previous1y shown results and, in some cases, was higher than 90%.


Tab. 5.33. Results of the musica1 phrases recognition process [%], MGCM. Rough-set system parameters: min. f.JRS equals 0.5 (using possible rules); neutral point from 0.2 to 0.9 (0 .1 increments ); certain rules, f.JR? 1. TOFFEESTAT database

Neutral point/ 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 certain Numberof rules intervals 7 70.83 70.83 70.83 72.92 66.67 31.25 20.83 6.25 35.42 10 91.49 91.49 91.49 93.62 89.36 70.21 34.04 4.26 70.21 15 93.75 93.75 93.75 93.75 89.58 83.33 41.67 6.25 85.42 20 93.75 93.75 93.75 93.75 89.58 81.25 43.75 8.33 85.42

In order to check the importance of selected parameters, another experiment was performed. In this experiment, a different parameter was omitted in consecutive tests. Tests were performed for the whole TOFFEESTAT database using the EIWM and MGCM discretization methods. The system settings were used as before. The obtained results are seen in Fig. 5.20.

lt may be seen that for the EIWM discretization method, the classification accuracy decreased in all of the tested cases, especially when parameter P1 was omitted. While the influence of parameter P3 on the recognition results seems of smaller importance, the recognition accuracy still decreases by approx. 20%. These results are in good accordance with the statistical analysis of the parameter values. Contrarily, when the MGCM discretization method was applied, omitting selected parameters (P1 and P2) caused better classification scores.

a. b.

100r--------------- .------~ 80 ~--------~----1·~· 60 ....._. ______ --= __ .,.,.,., __

100 paramelers 80 omltttd 60 omltttd

~-------------!••11 paramatert P1 omilted

P2 omilted 40 .... --m4- omltttd 40 PJ omltted

20 20

0

10 15 20 10 15 20

Interval number Interval number

Fig. 5.20. Recognition scores when omitting selected parameters using the EIWM (a) and MGCM (b) discretization methods

The recognition scores obtained in the tests show the high effectiveness of the rough set algorithm. This approach seems to be very valuable for testing the "quality" of parameters, additionally providing an appropriate tool for checking various sets of parameters.

However, despite the fact that the research findings show a high amount of consistency, one should refrain from more generalized conclusions because the obtained results are strongly dependent on the chosen representation of the musical pattems.

6. INTELLIGENT PROCESSING OF TEST RESULTS

6.1. lnconsistency of Subjective Assessment Results

There are several problems related to the object evaluation process. First of all, experts shou1d be trained and experienced in critical listening. An additional procedure based on a blind listening test provides an evaluation of the listener's self-consistency. 1f a listener always ranks the same system in the same way, bis reliability is thus undoubted. lt is desirable for experts to be of the same background, i.e. acousticians, musicians, etc., but in practice it is not easy to find such a group that is willing to participate in a series of experiments. It shou1d be remernbered that listening sessions are quite tiring and time consuming. Moreover, it is not convenient to walk this group of experts from one acoustic interior to another, with the fmancial aspect also being of importance in this case. On the other hand, there is a practical way to carry out this experiment. The evaluation procedure can be based on simu1ations of hall characteristics. In this case, sound excerpts recorded in an anechoic chamber, thus without any reverberation, are used. These sound samples are then processed by adding, for example, some reverberation. An expert, while listening, is then asked to rate the performance using grades such as low, medium and high. A number of experts shou1d take part in such evaluations, resulting in the relation of semantic descriptors to the particular parameter quantities. Furthermore, since the number of experts is rarely sufficient, this interrelation should also be validated statistically.

In the experiment, one shoulci make some assumptions, such as: • number of experts taking part in the subjective evaluation, • number of parameters to be tested, • number of music types, • number of points to be tested within the parameter range.

As was already mentioned, the frrst assumption is connected to the amount of time consumed and the costs related to the organization of listening sessions. The bigger the number of experts, the better the test reliability. However, by employing a statistical validation of the results, it is possible to decrease the number of experts. Techniques related to the statistical validation of test results are the subject of the next section. It is obvious that considering the large number of acoustical parameters, only those available in a signal processor can be tested.

166 CHAPTER6

However, additional acoustical verification of the results may be provided by testing similar parameters in some chosen real interiors. A variety of music types should be used in this subjective evaluation because they provide differentiated timbre-related and spatial qualities. The assumption regarding the parameter resolution to be tested is also important. Keeping in mind that such a listening session is fatiguing, the parameter step changes should thus be limited.

6.2. Application of Fuzzy Logic to the Processing of Test Results

The rationale for using fuzzy mathematics with regard to subjective measurement methods arises from the need to reduce the complexity underlying these methods. This complexity mostly results from the fact that sound quality evaluation is a largely undetermined procedure involving the brain processing of a perceived sound. Both the non-linear nature of subjective scales and their corresponding interpretation can Iead to a decrease in the rating reliability related to the criteria under test, especially when a listener is confused with the numerical aspects of the subjective scale Iabels. For these reasons, a fuzzy set-based approach to the interpretation of subjective testing results seems to be appropriate, and is likely to be more effective in comparison to a statistical approach [120].

The idea for introducing the soft computing approach to this task was first conceived in China, where the basic concepts related to this approach were proposed by Z. Bao in 1988 [9]. In this study, typical grades from excellent to bad were proposed without associating any score values to them.

In the fuzzy set approach to the processing of subjective test results two universal sets can be introduced in the matrix notation [9]:

U = {Subjective impression of sound} Q = { Quality grade scale}

and two subsets:

Ui = {xi,Xz, ... ,Xn}

and Qi ={EX,VG,G,F,B}

denoting respectively: subjective descriptors, such as clarity, diffusion, brilliance, fullness, etc.; and grades, as presented above. For example, given a certain sound sample, the number of listeners who assign the grade "EX'' to the xi th element are summed, and then the same operation is performed for all of the

other elements of the subset Ui . In this way, a series of fuzzy subsets may be obtained:

INTELLIGENT PROCESSING OF TEST RESULTS 167

where: j represents grades from Excellent to Bad, respectively, and o ii is the

degree of membership, e.g. the degree of the X; th element of U; is of the jth quality grade. Equation (6.1) may be rewritten in the following simplified form:

(6.2)

In order to get a comprehensive rating resu1t, a matrix of the degree of membership shou1d be presented as follows:

ou 012 oln

0 21 .. 0 2n

R= 0 31 .. 0 3n (6.3)

0 41 .. 0 4n

0 s1 .. 0 5n

In some cases where it is known that a certain parameter contributes to the overall subjective impression more than other parameters, the capability for assigning a weighting factor is needed. It is also convenient to use vector and matrix notations when dealing with weights. Weighting coefficients form a matrix as follows:

W= (6.4)

The final resu1t of rating is the product of matrices Rand W:

wl SI ou o12 Oln

w2 s2 0 21

.. 0 2n

S=RoW= 031 03n 0 = (6.5)

0 41 .. 04n

0 s1 .. Osn

Wn Sn

168 CHAPTER6

The fuzzy operator 11 o 11 is a basic operation involving fuzzy sets and may be defined according to the needs of an expert as union. There are two more operations to be dorre on the matrix elements before implementing this analysis in the experiments. First, the matrix S should be transposed in order to obtain a more convenient form:

(S)T =(RoWl =(R)T o(Wl =(w1 ···wn)o[::: a.,

anl

a151 a25

. .. - (6.6)

an5

and second, the (Sl values should be normalized in order to enable the

comparison of different sample sets. The elements of the matrix (Sl aretobe

normalized over their algebraic sum, therefore the nonnalized matrix (S') is

given in the form as in Eq. (6.7).

(6.7)

The matrix (Sl will give an answer as to which of the parameters contributes more to the overall quality. Correspondingly, if a total score T is required, then the subsequent formula is proposed:

T = s~ ·100 + s~ · 80 + s~ · 60 + s~ · 40 + s~ · 20 (6.8)

where: - s~; · ·,s~ - are elements of the matrix (S'). Some definite values are assigned to the descriptive terms EX, VG, G, F, B (Excellent, Very Good, Good, Fair, Bad). Conventionally, these values are assigned as follows: EX= lOO,VG = 80,G = 60,F = 40,B = 20.

6.2.1. Evaluation of Reverberator Features

Problem Statement

The reverberator is defined as an electronic device which is used for the synthesis of artificial reverberation. A sound processed by this device may acquire a new character, as though it was produced in the acoustical conditions of a concert hall. Most of the modern programmable reverberators are based on


modeling of the reverberation phenomena using delay network algorithms. These delay networks, originally proposed by Schroeder, are generally composed of comb- and all-pass filters [184].

The background of the recent study is related to an optimization method for the assessment of reverberation programs proposed by Czyzewski some years ago [46][47]. In his study, both multidimensional scaling procedures and subjective test techniques were taken into account, but the results obtained in the listening tests were processed using statistical analysis [ 46][ 4 7]. In the approach proposed by the author, techniques which are more comprehensive than statistical methods alone are introduced for processing listening test resu1ts. In these techniques, both fuzzy mathematics operations and rough set-based algorithms are engaged in the processing of test results. The listening tests carried out by the author and which are presented further on in this chapter are based on computer simulations of reverberation programs.

Paired test results

Be1ow, a short description of experiments which were carried out by the author is presented. Two subjective test sessions were organized: a paired comparison test, and a parametric test. The aim of these experiments was to choose the best reverberation program for fulfilling demands in the overall subjective preference domain. All of the tested programs modeled the reverberation parameters of a medium-size listening room. The number of subjects involved in both the paired comparison test and the parametric testwas equal to 8. The tests were carried out for two groups of experts with experience in subjective testing methods: the frrst group consisted of sound engineers professionally active in the domain of music and sound recording (5 people), and the second included students of the sound engineering specialization.

The experiments were performed emp1oying anechoical1y recorded music excerpts. Tlrree music fragments were chosen for the listening sessions, two classical and one popular. The results presented be1ow concern two of these music motifs, namely apart of Mozart's Op. 41 "Jupiter Symphony" (Test No. 1) and a pop music excerpt (Test No. 3). Additional1y, in order to study the influence of the music excerpts on the subjects' preferences, a comparison between the subjects' preference diagrams was made. The subjects were listening to sound samp1es obtained in the course of processing of the above music patterns p1ayed with 10 different reverberation programs. All examples had a duration of 15 seconds, and the intervals between them were 2 seconds. The test was presented every 5 seconds (see Fig. 6.1). This was also the time designated for a subject's response.

14]4r---1st pair --IIIIIIJI ]4 2nd pair--•111 I 15(s] l21sJI 15(s] I 5[s] I 15[s] l21sJI 15[s] l5(s] I

1st fragm. break 2nd fragm. break ~th fragm. break j-th fragm. break

Fig. 6.1. Paired testtime structure

170 CHAPTER6

According to the above given assumptions and according to Eq. (3.93), the maximumnumber ofpossib1e answers for one object is equal to:

N = 8· (10 -1) · 2 => N = 144 (6.9)

The number of pairs in one series of a test calculated using the above given assumptions is equal to:

N = (1o)= 10! = 45 1 2 8!·2!

(6.10)

and for both series of the test the number of pairs equals 2 · 45 = 90 . Hence, the total number of all resulting answers in the paired comparison test is equal to

8 · 90 = 720 for one music excerpt, and minimum time to perform the test is calculated according to expression (6.11), hence in the discussed case is equal to

2·1660[s]=3320[s].

where: Tc - mininmm time to perform one series of a test,

T1 - duration of a pattem object,

T2 - break between two objects,

T3 - break between consecutive pairs,

q - number of pairs in one series, k - number of objects

(6.11)

As may be seen from the resu1ts of these short calculations, this kind of test can be very time consuming. The task of preparing and afterwards listening to 90 pairs of signal samp1es is quite arduous.

Test 1 results for two groups of experts are given corresponding1y in Tab. 6.1 (group No. 1) and Tab. 6.2 (group No. 2). In the tables, the consecutive objects from A to J are 10 reverberation programs. The upper parts of the tables contain the number ofvotes for each object. Also, a co1umn for parameter z1 is included: the number and percernage of errors made by each expert. In the tab1es, the total numbers of votes, both for the objects and for the parts of the tests, are also shown. The lower parts of the tab1es show the values of parameters z2 and Z3 for all pairs of compared objects. In order to check whether subjects used the same criteria in the evaluation process, the z2 parameter was taken from statistical tables. For eight people twice eva1uating a given pair in a test, the number of different answers should not exceed 1. 96 at the 5% significance Ievel. This condition was not fulfilled by pairs B-H and C-J.

Parameter z2 shows the total number of errors for a given pair, whereas for parameter z3 the mark "+"was given when the significance criterion was not met;

INTELLIGENT PROCESSING OF TEST RESUL TS 171

that is, the threshold value of 1. 96 was exceeded. It may be said that the answers of an expert who always votes for the worse of the pair of sounds influences the parameter z3, as this is in conflict with the other experts' voting. On the other hand, the value of parameter z2 does not depend on that expert's voting, even if it is wrong, so long . as the expressed opinions are consistent. Additionally, both groups' results for test No. 3 are collected in Tab. 6.3 and 6.4.

To verify the assumption with regard to the effect of auditory memory on perception, a comparison of the results from both parts of the test had to be carried out. A similar comparison of the results from the two groups of experts provided the answer to the question of conformity of the observed tendencies. Both of these

comparisons were performed using Pearson's test z2 . A comparison ofthe results of the auditory monitaring tests between the groups of experts, which show a conformity of interpretation of the tested excerpts, is presented in Tab. 6.5. Additionally, a comparison between the results of both parts of the tests shows a slight auditory memory effect on the ability to differentiate between the tested excerpts (Tab. 6.6).

Tab. 6.1. Results oftest No. 1, GROUP No. 1

OBJECT A B c D E F G H I J ZJ [%) Expert 1 16 11 9 16 12 9 3 4 5 5 6 93.3 Expert2 16 12 11 17 10 5 4 3 6 5 7 92.2 Expert 3 15 10 11 17 13 7 2 1 5 9 7 92.2 Expert4 15 11 12 15 13 7 3 0 6 8 fg rgu Expert 5 15 9 13 17 11 9 1 2 5 8 6 93.3

swn 77 53 56 82 59 37 13 10 27 36 450

I part 1 39 27 25 43 31 19 5 4 17 15 225 I part2 38 26 31 39 28 18 8 6 10 21 225

PAIR AB AC BC CD HJ u 0 2 2

+ +

Tab. 6.2. Results of test No. 1, GROUP No. 2

-OBJECT A B c D E F G H I J ZJ lr%1 Expert 1 14 13 11 15 9 10 3 2 7 6 9 90 Expert2 13 11 13 14 7 9 2 3 7 11 12 86.7 Expert 3 10 14 14 14 8 61 2 5 11 6 8 91.1

swn 37 38 38 43 24 25 7 10 25 23 270

I part 1 20 18 19 20 12 13 4 4 11 14 135

II>art 2 17 20 19 23 12 12 3 6 14 9 1135

II PAIR I AB I AC 1...... I BC I ...... I CD 1...... I HJ I u

I72 CHAPTER6

I~ I~ Tab. 6.3. Results oftest No. 3, GROUP No. I

OBJECT A B c D E F G H I J r-- l[%1 ~ r---

Expert I I5 11 8 I6 7 10 3 I 9 10 ~ 94.4 Expert2 13 11 IO I7 8 5 5 2 6 I3 6 93.3 Expert 3 14 10 9 16 15 7 3 2 9 5 ~ 94.4

EXIJert4 12 14 11 15 10 6 2 2 9 9 3 96.7 Expert5 14 13 10 16 9 8 2 2 13 3 4 95.6

swn 68 59 48 80 49 36 15 9 46 40 450

part 1 34 30 23 42 26 17 7 4 24 18 1225 {lart 2 34 29 25 38 23 19 8 5 22 22 1225

Tab. 6.4. Results oftest No. 3, GROUP No. 2 (Continuation on page 172)

OBJECT A B c D E F G H I J ~ lr%1 Expert 1 13 10 9 16 11 7 5 1 13 5 5 96.7 Expert2 13 11 10 12 12 11 3 4 4 10 1o 94.4

Expert 3 14 11 11 14 14 7 3 3 8 4 fJ 95.6

swn 40 32 30 42 37 25 11 9 25 19 270

part 1 21 17 15 22 18 12 6 5 13 6 1135 part2 19 15 15 20 19 13 5 4 13 n 1135

Tab. 6.5. Comparison ofthe results ofthe auditory tests between groups of experts

Test 1 Test2 Test3 Test 1 13.066 7.186 Test2 13.066 6.587 Test3 7.186 6.587

Tab. 6.6. Comparison between the results ofboth parts ofthe tests

Part 1 Part2 Test 1 4.957 2.588 Test2 1.42 2.262 Test3 1.260 3.208


On the basis of the experts' answers, preferential diagrams were obtained (Fig. 6.2a,b). Both plots have some maxima for all the tests, showing a preference for the first and fourth reverberation programs. These plots indirectly provide information about the criteria which are the basis for obtaining a good reproduction of the investigated sound excerpt. Comparing both plots, it may be seen that in the case of Test No. 3 (pop music), a higher preference was assigned to the ninth reverberation program. One possible explanation is that the experts, when listening to the pop music, had slightly different preferences than when listening to the classical music.

a.

b.

50

40

30

20

10

0

50

40

30

20

10

0

No. of -+-Gr. No. 1, p.1

vot 5 ·-e--Gr. No. 1, p. 2

' / ~ ---.-Gr. No. 2, p. 1

~ '-.... ~ .... Gr. No. 2, p. 2

........

""" ~ ~ ~ t-- _...

ABC D E F Gll .I

Object

No. of -+-Gr. No.1, p. 1

VOtl 5 ......... Gr. No. 1, p. 2

~ / ~ ---.-Gr. No. 2, p. 1

... x .... Gr. No. 2, p. 2 ~ ~ ~ ~ ..... ...... '" ~ ~ /. IM.-

"_. """"'1

A B C D E F G II .I

Object

Fig. 6.2. Examp1es of preference diagrams for assessment tests: test No. 1 (a); test No. 3 (b). Gr. No. - refers to the expert group number; p. 1 and p. 2 reflect part of the test

Parametrie test resu/ts

Furthermore, the results of the parametric test, carried out in order to fmd the dependence of the subjective preference to individual reverberation programs,

174 CHAPTER6

may serve as data to be processed by both fuzzy logic-based reasoning and the rough set-based algorithm.

The process of evaluating the properties of artificial reverberation requires the following parameters to be taken into consideration:

• clarity (high ratings in this parameter require a wide frequency range, flat frequency response, low non-linear distortion) - CLAR,

• diffusion (ability of sound to diffuse in rooms) -DIFFUSION, • spaciousness (spreading of the auditory events: it is positive as long as it

creates a realistic impression of space) - SP ACE, • reverberation density (reflecting the threshold of unnoticeable spectral

coloration) REV_DENS, • comb filtering effects (auditory effect of parasite interference of direct and

reflected sounds) - COMB _FILT, • flutter distortion (distortion caused by too few reflections per second) -

FLUTTER.

At frrst, a list of terms commonly used in acoustic evaluation practice, allowing precise descriptions of parameters while at the same time being able to be correlated into grades, was collected. Then, these terms were introduced to the experts and, using their decisions, only some of the suggested terms were finally included in Tab. 6.7. Next, the experimental phase consisted in filling in the questionnaire form (Tab. 6. 7) by simply marking the appropriate place in the table after listening to the sound sample. The above cited descriptors were used, but the parameters that contribute negatively (COMB_FILT and FLUTTER) to the overall quality were linked only to one parameter, namely Naturalness (NA 1).

Tab. 6.7. Comprehensive rating table for testing reverberation programs

Grades\ Excellent Very Good Good Fair Bad Terms\ Parameter CLAR very clear, clear passable slightly blurred

distinct blurred DIFFUSION full very good good fair poor,

insufficient SPACE distinctive moderate passable slightly blurred

image image distorted image REV_DENS high, well moderate passable slightly low

balanced short NAT full moderate, passable, detectable present,

faithful, incidentally little strong, impercep- distorted distorted, distortion substantial tible slightly distortion distortion annoying


Then, all experts' grades of individual parameters were summed up for consecutive reverberation programs. Below, the results of the tests for the frrst (I) and the ninth (IX) exemplary reverberation programs are presented (Tab. 6.8 and 6.9).

Tab. 6.8. Questionnaire form for the tested reverberationprogram (I)

Grades\Vote\ Excellent VeryGood Good Fair Bad Parameter CLAR 2 4 1 1 0 DIFFUSION 1 3 3 1 0 SPACE 1 5 2 0 1 REV DENS 5 2 1 0 0 NAT 5 3 0 0 0

Tab. 6. 9 Questionnaire form for the tested reverberator program (IX)

Grades\Vote\ Excellent VeryGood Good Fair Bad Parameter CLAR 0 2 5 1 0 DIFFUSION 0 1 4 2 1 SPACE 0 0 2 5 1 REV DENS 0 0 0 6 2 NAT 0 0 0 2 6

In order to obtain subsets Qsf of the form presented below for the individual

parameters from Tab. 6.7 - Clarity (CLAR), DifJUsion (DIFFUSION), Spaciousness (SPACE), Reverberation Density (REV_DENS), and Naturalness (NAT) - the results from Tab. 6.8 and 6.9, respectively, were divided by the number of experts (equals to 8):

Reverberation program No. I: CLAR = (0.25, 0.5, 0.125, 0.125, 0) DIFFUSION= (0.125, 0.375, 0.375, 0.125, 0) SPACE=(0.125, 0.625, 0.25, 0, 0) REV_DENS=(0.625, 0.25, 0.125, 0, 0) NAT=(0.625, 0.375, 0, 0, 0)

Reverberation program No. IX: CLAR = (0, 0.25, 0.625, 0.125, 0) DIFFUSION= (0, 0.125, 0.5, 0.25, 0.125) SPACE=(O, 0, 0.25, 0.625, 0.125) REV_DENS=(O, 0, 0, 0.75, 0.25) NAT=(O, 0, 0, 0.25, 0.75).

176 CHAPTER6

The values of the e1ements in the above subsets reflect the degree of membership of each xi to QsJ .

The resulting rating matrices for samples I (1st reverberation program) and IX (9th reverberation program) are presented in Eq. (6.12) and (6.13) .

.25 .5 . 125 .125 0

.125 .375 .375 .125 0

Rr = .125 .625 .25 0 0 (6.12)

.625 .25 .125 0 0

.625 .375 0 0 0

0 .25 .625 .125 0

0 .125 .5 .25 .125

RIX = 0 0 .25 .625 .125 (6.13)

0 0 0 .75 .25

0 0 0 .25 .75

Provided the weighting matrix is an e1ementary one, thus the comprehensive

rating matrices (Sr )r = (Rr )r and (SIX l = (RIX l will consist of e1ements as

shown in Eq. ( 6.14) and ( 6 .15). As is seen from the matrices, the fuzzy set union operator was used, selecting the maximum value from each co1umn of matrices

(Sr l = (RI )T and (SIX l = (RIX )T.

(SI )T = (RI l = (.625 .625 .375 .125 0) (6.14)

(SIX )T = (RIX )T = (0 .25 .625 .75 .75) (6.15)

The normalized matrices (S~ l , (S'u: l are presented be1ow:

(S~ )r = (.36 .36 .21 .07 0) (6.16)

(S'u: l = (0 .10 .26 .32 .32) (6.17)

Total scores for both examples are calculated from Eq. (6.8): TI = 80.2 for

reverberationprogram I; and T IX = 42.8. for reverberationprogram IX. Since the

value of TI equals 80.2, it may be concluded that the quality of reverberation

program I is a little higher than the grade VERY GOOD, with the greatest


influence coming from the Spaciousness (SPACE) parameter. On the other hand, T IX equals 42.8 indicates that the second sample was evaluated with the grade FAIR, with the Reverberation Density (REV_DENS) parameter contributing mostly to that evaluation result

As is shown in the above example, fuzzy mathematics can be easily applied to the processing of subjective test results. The obtained results are logical and are proven tobe in good accordance with empirical data.

6.2.2. Evaluation of Audio CODEC Features

Problem Statement

Great progress has recently been made in the field of audio bit-rate reduction. The best known approaches are the standards used in broadcasting, namely ISOIMPEG Layers I, II, III, and, recently, IV [40][77][159][197]. These methods are based on perceptual coding and provide high-quality audio compression [40] [63][225]. They have become standards in the last few years, being introduced because of the introduction of digital broadcasting (Digital Audio Broadcasting) and because of the storage media market; in the latter case, to reduce the space that a high-quality audio signal occupies. The basis for the perceptnal compression of audio is the subjective characteristics of the human hearing sense. The fundamental task of the mentioned compression methods is to remove undesirable redundancy, while at the same time maintaining the high quality of the audio signal.

A number of methods aimed at making objective perceptual measurements of the audio quality of CODECS (coder/decoder pairs) have been introduced [8][41][64][195]. Unfortunately, the choice of evaluation methods is quite large [15][16][32][187], and at the same time their ultimate significance to the chosen problern is undetermined

Testing Procedure

The aim ofthe experimentswas to choose the best low bit-rate algoritlun which fulfilled demands in the overall subjective preference domain. This exemplary case is now typical, and should be considered a vital application within the realm of subjective acoustic testing.

The algorithms tested were as follows: 1. MUSICAM 256 kbit/s, 2. MUSICAM 192 kbit/s, 3. MUSICAM 128 kbit/s, 4. PASC 256 kbit/s. 5. Original music fragment (PCM format).

178 CHAPTER6

The experiments were performed using recorded music samples. The belowmentioned results concem three music motifs, namely:

1. A fragment of "The Late String Quartets," with the music of Beethoven; 2. Little Feat, "Hangin' On to the Good Times", Brüel & Kjrer Test Disc

fragment; 3. C. Orff, "Carmina Burana", Track 64, EBU SQAM Test Disc.

Subjects were listening to sound samples processed by the above-mentioned algorithms. The number of subjects engaged to evaluate these algorithms by listening to sound samples was equal to 5. The subjects taking part in the experiments were students and sound engineers active in the domain of music recording.

The parameters for the evaluation of low bit-rate algorithms were chosen as follows: presence ofperceivable noise (NO/SE), • presence of perceivable harmonic distortions (D/ST),

• clarity (high ratings in this parameter require a wide frequency range, flat frequency response, low non-linear distortions) (CI.AR),

• spaciousness (the spread of the auditory sources, it is positive as long as it creates a realistic impression of space) (SPA CE),

• stability of sound source localization in the Stereophonie plane (STAB), • overall quality (QUAUTY).

The results of the subjective ratings were collected in separate tables for each subject and for each motif.

Fuzzy Processing

The obtained test results were analyzed using fuzzy set-based reasoning. For analysis purposes, experts were asked to fill in a questionnaire form (fab. 6.10) similar to the one shown before by simply marking in the appropriate place after listening to the sound sample (in the same way as for the experiment shown in the previous section). As may be seen from Table 6.10, a Iist of attributes appropriate

for testing codec features was created.

Tab. 6.1 0. Questionnaire fonn for the MUSICAM 128 algorithm (motifNo. 1)

Grades\Vote\ Excellent Very Good Fair Bad Parameter Good NO/SE 0 1 2 2 0 DIST 0 1 3 1 0

CLAR 0 0 2 2 1 SPACE 1 1 2 1 0

STAB 0 1 4 0 0

INIELLIGENT PROCESSING OF TEST RESULTS 179

Summing up the number of votes for grades of individual parameters and dividing them by the number of experts (equal to 5), the subsets Qsf are obtained

for musical fragment No. I in the form presented below:

MUSICAM 128 NO/SE= {0, 0.2, 0.4, 0.4, 0} DIST= {0, 0.2, 0.6, 0.2, 0} CLAR = {0, 0, 0.4, 0.4, 0.2} SPACE = {0.2, 0.2, 0.4, 0.2, 0} STAB= {0, 0.2, 0.8, 0, 0}

The values of the elements in the above subsets reflect the degree of membership of each X; to Qsf. The resulting rating matrix is presented in Eq.

(6.18):

0 .2 .4 .4 0

0 .2 .6 .2 0

RMUSICAMI28 = 0 0 .4 .4 .2 (6.18)

.2 .2 .4 .2 0

0 .2 .8 0 0

The comprehensive rating matrix (S MUsicAM128 l = (RMUsicAM128 l will consist of elements as shown in Eq. ( 6.19) (no weights were assigned). As may be seen from the matrix, the fuzzy set union operator was used, selecting the

maximum value from each column of matrix (S MUsicAM128 )r = (RMUsicAM128 l .

S T T ( MUSICAMI28) = (RMUSICAMI28) = (.2 .2 .8 .4 .2) (6.19)

After applying the normalization the matrix (SMUsicAMt 28 )r takes the form:

'T (SMUSICAMI28) = (.11 .11 .44 .22 .11) (6.20)

The total score is calculated according to Eq. (6.8): T MUSicAM128 = 57.2 Since

the value of T MUSICAMt 28 equals 57.2, it may be concluded that the quality of the

MUSICAM 128 algorithm is a little lower than the grade GOOD, with the most influence coming from the STAB parameter. It should, however, be remernbered that in the experts' common opinion, the quality ofthe MUSICAM 128 algorithm is sufficient for audio transmission tasks.

180 CHAPTER6

6.3. Application of Rough Sets to the Processing of Test Results

Rough set-based processing of subjective test results was first proposed by the author in some recent publications [106][120][130]. This approachwas applied in order to find the tendencies underlying experts' votes in subjective listening procedures. In this chapter rough set -based processing will be used in a series of experiments based on acoustical data obtained in subjective listening tests.

A standard decision table was used in the rough set-based processing (see Tab. 2.1). Therefore, objects t1 to tn from Tab. 2.1 represent various acoustical objects, and attributes A1 to Am are denoted as tested parameters, and are used as conditional attributes. The expert's scoring is defined by the grades a11 to anm (the quantized values are labeled descriptively, or quantitatively). The decision D is understood as a value assigned to the overall quality of sound (QUALITY).

6.3.1. Evaluation of Reverberator Features

The experiment described in Section 6.2.1 was slightly modified in order to prepare the parametric test results for processing by the rough set -based learning algorithm. While voting for specific parameters, the overall quality was also taken into account by experts. The overall quality rating scale was from 5 (Excellent) to 1 (Bad). The form of the obtained results is shown in Tab. 6.11. As is seen from table subjective ratings were given in descriptive form - absent, present, low, medium, and high.

Tab. 6.11. Scores obtained for a selected expert (expert No. 1)

Rev. COMB_ IFWTTER CLAR DIFFUSION REV_DENS SPACE QUAL.

prog. FILT

nwnber/ Param. 1st absent absent med. med. high med. 4 2nd absent absent med. med. high med. 3 3rd ipresent absent med. low low med. 3 4th absent absent hi_gll med. hi~ hi_gh_ 5 5th absent absent [high high_ high. hi_gh_ 4 6th absent lpresent high low high low 2 7th jpresent lpresent low high low low 1 8th [present lpresent low low low low 1

9th [present IPresent high med. low low 2 10th absent absent low low low low 2

Looking at Tab. 6.11, it is possible to see which parameters most influence the quality decision. However, the simultaneaus analysis of 8 such tables for consecutive subjects and for different signal samples would be a complex task.


The traditional approach to this problern exploits the principles of statistical data analysis. As will be shown, it is possible to employ an expert system to find dependencies among subjective ratings and to derive some rules underlying the process of decision making by the experts.

Rough Set-Based Analysis of Parametrie Test Results

In the rough set-based analysis applied by the author to the listening test results, the QUALITY parameter was defined as the decision attribute, with all other parameters contained in Tab. 6.11 being used as condition attributes. Since experts' ratings were given in descriptive form, it was therefore not necessary to quantize the data.

The number of subjects involved in the parametric test was equal to 8, however expert No. 7 and his answers were eliminated due to reliability test failure. As a result of processing the data with the rough set -based algorithm, reducts were obtained, some of which are listed below:

Reducts: CLAR, DIFFUSION, SPACE COMB_FILT, DIFFUSION, SPACE COMB_FILT, CLAR, SPACE

The next step of data processing was the calculation of rules. The strongest of these are presented below:

Rules: ( CLAR medium) & (SPA CE high) then (Q UALITY 4 ), (SPACE medium) then ( QUALfiT 3), (COMB _FILT absent) & (SPACE low) then (QUALITY 2), (COMB_FJLTpresent) & (SPACEiow) then (QUALITY 1), (CLAR high) & (SPACE high) then (QUALITY 5).

It is very interesting that these rules derived from the rough set -based data processing confmn the main principles recognized in the domain of acoustics. They also permit conclusions to be drawn on the choice of assessed attributes.

The presented results are limited to one example, namely a single music motif used with this procedure. Other studies show that the reverberation program parameters should be correlated to the character of the music motifs to be processed by the reverberator [46][47]. Consequently, the derived rules should be applied to the results of many similar test sessions. After processing the results of two further sessions employing two more music motifs, the following global mies remained in effect:

Global roles: (CLAR medium) & (SPACE high) then (QUALITY 4), (SPACEmedium) then (QUALITY3),

I82

(COMB_FILTpresent) & (SPACElow) then(QUAUTYl), (CLARhigh) & (SPACEhigh) then (QUAUTY5).

CHAPTER6

These rules may therefore be used for automatic decision-making on the basis of a collection of subjective attribute ratings.

6.3.2. Evaluation of Audio CODEC Features

The rough set-based algorithm has been applied to the analysis of results obtained in listening tests, as shown in Section 6.2.2.

An exemplary set of data presenting answers for expert No. 1 and musical fragment No. 1 is collected in Tab. 6.12. The quality grades for parameters such as NO/SE & DIST, being negative characteristics, were as mentioned above: imperceptible (grade 5), perceptible but not annoying (grade 4), etc. Grades from 5 to 1 were also assigned to parameters having positive meanings, however, in this case 5 was understood as excellent, 4 as very good, etc. Looking at Tab. 6.12, it is possible to see which parameters most influence the quality decision. However, a simultaneaus analysis of many such tables for consecutive subjects and for all signal samples becomes arduous. That is why soft computational methods have been applied to the task.

Tab. 6.I2. Scoresandratings obtained for a selected expert (for the Ist motit)

Parameter\ NOISE DIST CLAR SPACE STAB QUALITY Algorithm Ist 4 4 3 5 5 4 2nd 4 4 3 5 4 4 3rd 4 3 2 4 4 2 4th 4 4 4 5 4 4 5th 5 5 4 5 5 5

The results obtained from the performed parametric tests were processed using the rough set-based algorithm formulated at the Sound Engineering Department of the Technical University of Gdansk [48][49]. The main task was to find the tendencies underlying the quality evaluation by individual subjects. The "QUAUTY'' parameter was defined as the decision attribute, with all other parameters contained in Tab. 6.12 being used as condition attributes. As a result of processing the data with the rough set-based algorithm, reducts were obtained, some of which are listed below:

Reducts: CLAR, SPACE, STAB NO/SE, CI.AR, SPACE CLAR,STAB DIST, CI.AR


The next step of data processing was the calculation of rules. The strongest of these are presented below:

Rules:

(CLAR 5) then (QUALITY 5), (NO/SE 3) & (STAB 5) then (QUAUTY 4), (STAB 2) then (QUAUTY 2), (STAB 4) then (QUAUTY 4), (CLAR 2) then (QUAUTY 2).

The presented results are limited to a single music motif. After processing the results of two further sessions employing a total of three music motifs, the following global rules remained in effect:

Global mies:

(CLAR 5) & (NO/SE 5) then (QUALITY 5), (NO/SE 3) & (STAB 5) & (DIST 4) then (QUALITY 4), (STAB 2) then (QUALITY2), (STAB 4) then (QUALITY 4), (CLAR 2) then (QUAUTY2).

Also, a core was found: the CLAR parameter. The results show that listeners were generally not perceiving noise or harmonic distortions independently of the compression rate. Rather, they based their assessments on the two parameters related to the spatial properties of sound: CLARITY (CLAR) and STABIUTY (STAB) of localization. This result is in good accordance with the common opinion concerning the in:fluence of perceptual compression on the subjective perception of sound.

lt seems that such a method of listening test result processing is very useful, particularly because the obtained results are easily interpreted. The result obtained through the fuzzy set approach (STAB parameter supporting decision, see Section 6.2.2) also seems tobe in good accordance with the above-given results.

6.4. Rough-Fuzzy Method of Test Result Processing

In this section, a new method for the automatic assessment of acoustical quality, proposed and engineered by the author, is introduced. This method uses a combination of the rough set learning algorithm and fuzzy logic inference. The proposed system is tested and investigated in a series of experiments using architectural acoustic data obtained on the basis of both subjective listening tests and objective measurements.

184 CHAPTER6

6.4.1. Evaluation of the Acoustical Features of Concert Halls

Relationships between the objectively measured parameters of acoustical objects (concert halls, sound processing programs, loudspeakers, etc.) and their subjective quality as assessed by listeners (preferably experts) cannot in most cases be crisply defined, leaving a wide margin of uncertainty which depends on individual subjects' preferences and the unknown influences of individual parameter values on the overall acoustic quality of the tested object. Consequently, results of subjective tests have to be processed statistically (hitherto used approach) in order to find links between preference results and concrete values of parameters representing the objective features of tested objects.

In this section, a new extended proposal of the procedure for analyzing subjective testing results is formulated. Having collected the assessments of the overall acoustical quality of the tested objects from all of the experts, it is possible to create a decision table and then to process this table using the rough set method. In this way, a set of rules may be created which may subsequently be verified by experts. The next step is to analyze the objective parameters step-by-step and to try to obtain subjective ratings for each of them as assessed separately from the others. The mapping of objective parameter values to their subjective assessments by many experts creates some fuzzy dependencies which can be represented by fuzzy membership functions. Then, with rules determined from the rough set decision table and membership functions determined empirically for the studied parameters, one can create an expert system which provides automatic decisions on acoustical quality each time a concrete set of parameters is presented to its inputs. This system, engineered by the author, uses fuzzy logic principles for the automatic determination of the acoustical quality. Consequently, it provides a complete expert system for automatic assessment of objectively measured acoustic features (see Fig. 6.3).

a. ROUGH..fUZZY SYSTEM- KNOWLEDGE ACQUISTION

PHASE

KNOWLEDOE BASE


b.

I ROUGH.fUZZY SYSTEM- AUTOMAT1C ASSESSMENT I ·'*' I New Set of Data I + I KNOWLEDGE BASE II Parameter Fuzzlfic.tlon I + I Rules I J Applytng Rules Derived - -..t

.1using Rough Set S}IStem 1: (I ROUGH-FUZZY

1 Jlls wed u ftljflt 1 y INFERENCE

J Clllculation of Rulea -!' -1 strength - -•

+ I Parameter Defuzzltlcatlon I

+ L OVERALL QU.ALITY ASSESSMENT I

Fig. 6.3. Engineered rough-fuzzy expert system: knowledge acquisition phase (a), automatic assessment (b)

Concise Description of the Engineered Expert System

Knowledge acquisition phase:

- Selection of acoustical objects to be tested; - Choice of subjective parameters describing the acoustical quality of these

objects; - Subjective listening tests carried out with regard to the object quality to be

assessed (various acoustic interiors, either existing or simulated). The tests should use subjectively defined parameters which can be expressed in terms of objective measures. Parameter values should be expressed in ranges labeled descriptively as low, medium, and high;

- Collecting all experts' answers related to the overall quality into tables together with the descriptively labeled values of parameters;

- Creating a rough set decision table from the collected data; - Rough set processing of the above decision table ( derivation of reducts and

rules); - Measuring the objective characteristics of the investigated acoustical objects; - Calculating histograms from the experts' votes for separated parameters; - Defining the universe and domain of objectively measured parameters, labeling

membership functions representing subdomains (ranges, scopes) of objectively measured parameters (the most typicallabels are: low, medium, and high);

- Defining fuzzy sets on the basis of subjective voting results in such a way that each assessed parameter value is mapped to the number of votes assigned to it by experts;

186 CHAPTER6

- Estimation of membership function shapes based on the probability density approach;

- Statistical validation of the obtained membership functions by means of Pearson's i test (especially important in the case of a statistically small number of tested objects).

Automatie quality assessment phase:

- Collecting new parameter values related to an object that was not previously measured;

- Calculating the degree of membership for each parameter and for each predefined membership function;

- Applying the rules stored in the knowledge base ( derived using the rough set method and validated by experts during the learning phase );

- Calculating the value assigned to each rule (using the fuzzy AND function in the conditional part ofthe rules);

- Finding the rule which was assigned the maximum value (the winning rule); - Applying the 1!.-cut to the output membership function associated with the

winning rule ( one of the membership functions describing the overall preference );

- Calculating the centroid value on the basis of the 1!.-cut, as above; - Mapping the centroid onto the 100 point subjective grade scale. The crisp

value which is thus obtained provides a measure of the automatically assessed acoustical quality ofthe tested object.

Exemplary Problem Statement

Measurable Data Analysis

When considering the evaluation of an acoustical hall, both listening tests and measurement procedures are carried out, resulting in a set of data. Usually, some statistical tests are employed in order to check the reliability of the obtained results.

In Tab. 6.13, a set of acoustical data is presented. lt represents parameter values measured in various acoustical halls. For this exemplary set of data, some basic statistical measures were calculated (mean value and dispersion). They are presented in Tab. 6.14. Additionally, correlation coefficient values (Eq. (3.99)) and the corresponding Student's t (Eq. (3.102)) values are shown, respectively, in Tab. 6.15 and 6.16. The r coefficient significance was checked according to

expression (3.99), and with regard to the inequality jt0 j > ta. It was found that

correlation coefficients between the pairs ofparameters 1-2, 1-4, 1-7, 2-4, 4-5, and 6-7 (values of statistic t highlighted in bold in Tab. 6.16) are quite large.

INIELLIGENT PROCESSING OF TEST RESUL TS 187

Signi:ficance testing has shown that these parameters are strongly correlated (at signi:ficance Ievels of both 0.01 and 0.05), while other signi:ficant correlations were not found.

Tab. 6.13. Exemplary acoustical data

Hall Definition Diffusion Intimacy EDT RT Loudn. Spatial No. cdef ITDG Imp. Cs1 i 1 0.5076 0.4321 7.1 1.950 2.14 0.2837 0.3583 2 0.5256 0.2868 14.9 1.83 2.25 0.3429 0.3565 3 0.5620 0.4218 2.7 1.66 1.75 0.2815 0.4070 i ......... ......... . ........ ......... ......... ......... . ........ n 0.6695 0.2719 27.7 1.413 1.617 0.2058 0.1981

Distributions of the values of parameters Definition-Diffusion and EDT-RT, respectively, are shown in Fig. 6.4. As may be seen from the figure, the data are strongly correlated (a. negative correlation, b. positive correlation) in both cases.

a. Linear Regression ofDefinition on DifE 1~------~~r-----~

' moa«X)

'

0.5 +--------+k..___,• __ 'lllnnl.,~ .• ,rl r-;.: '+ ------------ ~-!K;---

+

0 0.5 + + + X-Y data X- Definition Y- Diffusion

b. Linear Regression of EDT on RT

' p+ .... ~ 1ll on(Y)

:~ t/ ----;r ---------- -----

' ' ' ' ' ' '

0 2 4 ++ + X-Y data X- EDT Y- RT

Fig. 6.4. Scattering of data from Tab. 6.13 (a. Definition vs. Diffusion, b. EDT vs. RT)

Tab. 6.14. Mean values and dispersions ofthe data set presented in Tab. 6.13

Par./Stat. cdef Diffusion ITDG EDT RT Loudn. Cs1

Mean 0.628 0.305 19.061 1.643 1.888 0.278 0.290 Dispersion 0.105 0.106 8.630 0.369 0.374 0.053 0.091

Tab. 6.15. Correlation coefficients r ofthe data set presented in Tab. 6.13 (Cont. on p. 188)

r cdef Diffusion ITDG EDT RT Loudn. Cs1

cdef 1

Diffusion -0.817 1 ITDG 0.368 -0.295 1

188 CHAPTER6

EDT -0.748 0.746 0.099 1 RT -0.413 0.493 0.131 0.828 1 Loudn. -0.581 0.322 -0.251 0.298 0.051 1 Csz -0.769 0.638 -0.544 0.418 0.126 0.744 1

Tab. 6.16. Student's statistic t ofthe data set presented in Tab. 6.13

t cdef Diffusion ITDG EDT RT Loudn CSI

cdef --Diffusion -5.294 --ITDG 1.479 -1.154 --EDT -4.217 4.191 0.372 ----RT -1.695 2.122 0.496 5.534 --Loudn. -2.673 1.274 -0.968 1.167 0.192 --CSI -4.508 3.098 -2.427 1.724 0.476 4.163 --According to the above statistical considerations, it may be assumed that the

number of measured parameters may be reduced to only 2 or 3 parameters for a given hall. It should be emphasized, however, that the performed calculations were limited to the available data. Therefore, definite conclusions on the number of parameters needed to describe a hall cannot yet be drawn.

Rough Set Processing of Acoustical Data

In the next steps of the analysis, the results of the listening test sessions should be collected into tables, separately for each expert and for each of the various music excerpts. Then, these tables should be transformed into the format of the decision tables used in the rough set decision systems (Tab. 2.11). As was mentioned previously, a practical way exists to carry out an evaluation procedure in laboratory conditions. Such an experiment can be based on computer simulations of hall acoustics. In this case, sound excerpts recorded in an anechoic chamber, thus without any reverberation, are used. Therefore, objects t1 to tn from Tab. 2.1 represent various simulated acoustical interiors, and attributes A1 to Am are denoted as tested parameters, introduced previously, and are used as conditional attributes. The expert's scoring is defined by the gradesau to a"", (the quantized values are labeled descriptively as low, medium, and high). The decision D is understood as a value assigned to the overall quality of sound (QUAUTY).

The result of the rough set-based processing is a set of rules that will be later used to assess the quality of an object unseen by the system.

The questionnaire form used in listening tests was as presented in Tab. 6.17. The same descriptors (attributes) as previously shown in Tab. 6.13 were used in the subjective assessments. Subjects were asked to fill in the questionnaire. The expert decision setwas limited to 3 grades (1 -low, 2- medium, 3- high). Having the results of several simulated halls and at the same time having collected the

INTELLIGENT PROCESSING OF TEST RESULTS I89

data from several subjects, these data are then processed by the rough set algorithm.

Tab. 6.I7. Listening test results for hall No. i

Subjectl Grades/ cdef Diff. ITDG EDT RT Loudn. Csi QUALITY Descrip tors I I 3 I 2 3 2 2 2 i ........ ........ ........ ....... . ...... ........ ........ .. ...... n 2 I 3 3 2 3 2 3

The first step now is the elimination of duplicated rows in the decision tables (superfluous data elimination). The second step for processing the data is the calculation of rules. In the discussed example, the following strongest rules were obtained:

RULES:

if ( C81 med) then (Q UAUTY good), f.1R? 1 if (EDT low) & (Csi low) then (QUALITY fair), f.JJW=0.9 if (Caef med) & (C81 high) & (Loudn. high) then (QUAUTY good), f.JJW=0.8 if (Loudn. high) & (RT med) then (QUAUTY very good), pJW=0.8 if (Loudn. med) & (EDT med) then (QUAUTY good), f.Jw0.7 if (Caef med) & (C81 high) then (QUAUTY very good), f.JR?0.7

Mapping Test Results to Fuzzy Membership Functions

In this step of experiment, acoustical simulations were used instead of real hall measurements in order to minimize the costs of the experiment. The sound samples recorded in an anechoic chamber are then processed by adding some portions of artificially generated reverberation. Experts, while listening, are instructed to rate their judgements of the performances using such descriptions as low, medium, and high. This procedure introduces a concept of the Fuzzy Quantization Method (FQM) applied to acoustical parameters. This results in the relation of semantic descriptors to the particular parameter quantities.

Some exemplary data are graphed in Fig. 6.5. As may be seen from the figure, the distribution of the observed instances suggests the trapezoidal shape of a membership function. In the next step of the analysis, such membership functions will be de:fined by the use of some statistical methods.

In any case, it is also necessary to define the set of output membership functions representing the overall subjective preference grades. It was assumed that this preference is expressed in a 100 point linear scale subdivided to 3 ranges mapped non-exclusively to 3 typical membership functions.

190 CHAPTER 6

N low

0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 RT

N mediwn

0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 RT

N

0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 RT

Fig. 6.5. Experts' vote for the parameter RT, N- nurober of experts voting for particular values of RT

Analysis Procedure

One of the main tasks of subjective tests result analysis is to approximate the tested parameter distribution. This can be done by several techniques. The most common technique is linear approximation, where the original data range is transformed to the interval [0,1]. Thus, triangular or trapezoidal membership functions may be used in this case. In the linear regression method, one assigns minimum and maximum attribute values. Assuming that the distribution of parameters provides a triangular membership function for the estimated parameter, the maximum value may thus be assigned as the average value of the obtained results. This may, however, cause a loss of information and bad convergence. The second technique uses bell shaped functions. The initial values of parameters can be derived from the statistics of the input data. Further, the polynomial approximation of data, either ordinary or Chebyshev, may be used. The polynomial approximation of order k approximates a given set of parameter values using k+ I coefficients, assuming the Ieast-square error. This technique is justified by a sufficiently large number of results or by increasing the order of polynomials; however, the latter direction may Iead to a weak generalization of results:


Coefficients of a linear combination of Chebyshev polynomials of the degree 0, 1, ... ,k may also be used for data representation purposes. As is seen from the presented considerations, there are some advantages and disadvantages concerning the mentioned methodologies. Another approach to defining the shape of the membership function involves the use of the probability density function. The last mentioned technique will be discussed more thoroughly.

An approximation of the obtained results, calculated on the basis of the leastsquare criterion, is shown in Fig. 6.6 (data from Fig. 6.5- "medium" membership function). Data in Fig. 6.6a are presented as obtained in the experiment, while the y axis in Fig. 6.6b is transformed logarithmically. As may be seen from Fig. 6.6, in the latter case the least -squares fit is much better than when applied to the raw data.

0 I l 3

+++ X-Y data - Least-squares fit +++ X-Y data - Least-squares fit

Fig. 6.6. Approximation of obtained results on the basis of the Ieast-square criterion (quadratic fit): a. linear scales, b. logarithmic scales).

Intuitively, it seems appropriate to built the initial membership function by using the probability density function and by assuming that the parameter distribution is trapezoidal or triangular. The estimation of the observed relationships is given by the functions shown in Fig. 6.7.

a. fi(x,A,b,c)

b

fi(x,A,b,c,d,e)

c

b. fj(x,A,d,e) f,(x,A,b,c,e)

e

Fig. 6.7. Trapezoidal (a) and triangular (b) membership functions estimated by the probability density function

192 CHAPTER6

The subsequent.fi .. fi membership functions from Fig. 6.7a are defined by a set ofparameters: A, b, c, d, and e, and are determined as follows:

{A ifx<b

Ji(x,A,b,c)=> A(c-x)/(c-b)if b~x~c

Oif X>C

{0 ifx<d

J3(x,A,d,e)= A(x-d)l(e-d)if d~x~e

Aif x> e

(

0 if x<borx>e

f.(xAbcde)= A(x-b)/(c-b)if b~x~c 2 ' ' ' ' ' A if C<X<d

-A(x-e)/(e-d)if d~x~e

(6.21)

(6.22)

(6.23)

Additionally, in the case of a triangular membership function, symmetty of the sides of the triangle is assumed.

{0 if e<x<b

f 4 (x,b,c,e) = A(x-b)/(c-b)if b ~ x ~c

A(e-x)/(c-b)if c~x~e

(6.24)

The trapezoidal functi.onj2(x,A,b,c,d,e) and the triangular function.f4(x,A,b,c,e) both represent the estimated "mediwn" membership functions. The equation describing the m th moment of the probability density for the function h(x,A,b,c,d,e) is calculated as follows:

(6.25) -CO

The estimate of the mth moment of the probability density function from the test (asswning that all observation instances fall into the interval j, where: j = 1, 2 ... k) is calculated according to the formula:

k

mn = 'Lx"(P(x=xi) (6.26) j=l

where: P(x=xi) represents the probability that the attribute value of instance x falls into the interval j.

IN1ELLIGENT PROCESSING OF TEST RESULTS 193

Below, the subsequent moments of order from 0 to 4 for this function are calculated on the basis of the formula:

then:

Respectively:

and:

Furthermore:

and:

Finally:

e

m0 = J f 2 (x,A,b,c,d,e}dx b

A m0 =-(d -c+e-b)

2

e

m1 = J x· f 2(x,A,b,c,d,e)dx b

e

(6.27)

(6.28)

(6.29)

(6.30)

m2 = J x2 • f 2 (x,A,b,c,d,e)dx (6.31) b

e

m3 = J x3 • f 2 (x,A,b,c,d,e)dx (6.33) b

e

m4 = J x4 • f 2 (x,A,b,c,d,e)dx b

(6.35)

194 CHAPTER6

Next, by substituting the observed values into Eq. (6.26), the consecutive values of mn are calculated. From this, the set of 5 linear equations with 5 unknown variablesA,b,c,d,e is tobe determined. After numerically solving this set of equations, the second task of the analysis is to validate the observed results using Pearson' s i test with k-1 degrees of freedom.

(6.37)

where: ni- number of observed instances within the intervalj, Pi- probability of the instance falling within the interval j ( estimated by the probability density function), n - number of intervals j (j=l,2 ... k). Additionally, the variable npi should have the same value for a11 observation intervals.

Furthermore, it is assumed that the significance Ievel should be set at 5%. After calculating the value of i from Eq. (6.37), this value should be compared with the critical one given in the statistical tables. If the computed value is smaller than the one from the statistical tables, then the null hypothesis is valid. In the contrary case, the hypothesis should be rejected and the assumed trapezoidal membership function is not a valid model of the measured phenomenon.

Using the above statistical method, an approximation of the fuzzy membership function for the studied parameter can be made. The membership functions reflect the number of subjective votes given by experts to individual values of the assessed parameters (Reverberation Time (RT) in the example discussed previously).

On the other hand, in some cases it seems that a sufficiently good approximation of measurement data can be done by using the triangular membership function, for which the estimating function is also of a simpler form. The subsequent moments of order from 0 to 2 for the triangular membership function ("medium") are calculated as in Eq. (6.38-6.44). Additionally, the symmetry ofthis function was assumed, therefore:

and:

e=2·c-b e

m0 = J f 4(x,b,c,e)dx b

m0 = A·(c-b)

(6.38)

(6.39)

(6.40)

INTELLIGENT PROCESSING OF TEST RESUL TS

Respectively:

and:

Furthermore:

then:

e

m1 = J x· J4 (x,b,c,e}dx b

m1 =A·c·(c-b)

e

m2 = J x2 · J4 (x,b,c,e }ix b

A·(c-b) 2 2 m2 = · (b -2 · b · c + 7 · c )

6

195

(6.41)

(6.42)

(6.43)

(6.44)

After substituting the observed values, the set of equations described in Eq. (6.38-6.44) is solved. The consistent solution according to the measurement data is given below:

A = 5.9,b = 0.8,c = 1.7,e = 2.5

In the case where condition (6.38) (i.e. function symmetry) is not fulfilled, the third order moment should be inserted into the equation set to be solved. In this case, one of the consistent solutions is given by the set of parameters:

A = 6.2,b = l,c = 1.4,e = 2.7 ·

According to Pearson's i test, the null hypothesis is valid and thus the obtained results may be used in further analysis.

Automatie qua/ity assessment phase

In order to enable the automatic quality assessment procedure, new data representing the parameter values of a given concert hall is fed to the system inputs (Tab. 6.18). The first step is the fuzzification process in which degrees of membership are assigned for each crisp input value (as in Fig. 6.8). As was mentioned previously in the presented example, the number of membership functions was limited to three. Therefore, for the data presented in Tab. 6.18, the

196 CHAPTER6

degree of membership for each input value (for a given label) has to be determined.

Tab. 6.18. Set of parameter values presented to the system inputs

The pointers in Fig. 6.8 refer to the degrees of membership for the precise value of RT=1.75. Thus, when the value of RT equals 1.75, it belongs, respectively, to the low fuzzy set with the degree of 0, to the medium fuzzy set with the degree of 0.65 and to the high fuzzy set with the degree of 0.35. The same procedure is applied to the other parameters ofTab. 6.18.

It should be remernbered that after the rough set processing, only the strongest rules with rough set measure value exceeding 0.5 had been considered. This rough measure will be treated further as the weight applied to this rule when it is used for the fuzzy processing. Additionally, the strength of the rule in the fuzzy processing is determined, according to fuzzy logic principles, on the basis of the smallest value of the degrees of membership found in the rule premise. Therefore, the overall strength of a mle is equal to the product of the rough set measure and the strength derived from the fuzzy processing (as shown below):

if ( C SI = 0.35) then (QUALITY good) ::::> rule strength = 0.35 · f.JR.s = 0.35 0 1= 0.35,

if (EDT = 0) & then (CSJ = 0) then (QUALITY fair) => rule strength =

O·f.JR.s=O if (Cdef = 0.65) & (C81 = 0.65) & (Loudn. = 0) then (QUALITY good) =>

rule strength = 0 · f.JR.s = 0 if (Loudn. = 0.55) & (RT = 0.65) then (QUALITY very good) => mle strength

= 0.55 · f.JR.s = 0.55 · 0.8=0.44 if (Loudn. = 0.45) & (EDT = 0.45) then (QUALITY good) => rule strength =

0.45 · f.JR.s = 0.45 · 0.7=0.32 if (Cdef = 0.65) & (C81 = 0.65) then (QUALITYvery good) => rule strength

= 0.65 · f.JR.s = 0.65 · 0.7 = 0.46

-~ 1 1 2 3 -- -----·

O~RT

Fig. 6.8. Fuzzification process ofthe parameter RT

INTELLIGENT PROCESSING OF lEST RESULTS 197

The next step is to detennine the maximum rule strength for all of the rules involved in the same output action. The winning rule is found on the basis of the maximum value. Therefore, for the data set given in the example, the fuzzy output is defined in Tab. 6.19.

Tab. 6.19. Fuzzy output ofdata from Tab. 6.18.

Membership Fair Good Verygood function Fuzzy output 0 0.35 0.46

The defuzzification method used in the system is based on the calculation of the centroid value, given by fonnula:

b b

G= J,u(x)·xdxtJ,u(x)dx (6.45) a a

However, in most cases it is su:fficient to use the estimate of centroid, according to expression:

b b

G = L,u(x) · xiL,u(x) (6.46) a a

In the latter case, only a finite number of points from the output domain is taken into account.

A graphic illustration ofthe defuzzification process is shown in Fig. 6.9. In this figure the A.-cut resulting from the analysis is applied to each output membership function.

Fig. 6.9. Deffuzification process

fair- 1 good- 2 very good- 3

The estimated centroid value calculated for the data from Tab. 6.18 is equal to 85. Thus, the obtained crisp value provides a measure of the automatically assessed acoustical quality of the tested object. As may be seen, the overa11 quality ofthe object from Tab. 6.18 belongs to the scope ofthe fuzzy set labeled as "very gootl'.

198 CHAPTER6

6.4.2. Optimization of Noise Reduction Algorithm Parameters

Problem Statement

Presently, the majority of audio signals transmitted between or recorded into digital systems is perceptually encoded. However, as is shown by the results of recently proposed methods [51][54], the perceptual coding of audio can be also used for suppressing noise which is affecting audio signals. Such noise might be caused, for example, by recording procedures or by the transmission of audio signals through telecommunication chanuels. Thus, the application of perceptual coding not only preserve the original quality of audio material, but can also subjectively improve it [53][54]. The masking curves providing the basic mechanism of this noise reduction algorithm divide the spectral magnitudes of audio signal into two categories: audible components (stretched beyond the masking curves), and inaudible ones (remaining below these curves). When determining the settings of the masking model, one must use experimental procedures. Hence, the parameter values to be optimized can only be discerned through listening tests, yet their results do not clearly reveal the dependencies which are sought between the parameters and audio quality. Forthis reason, the rough set method may be employed to facilitate the process of optimizing the perceptual algorithm for noise reduction.

A special test procedure was elaborated by the author for this purpose, which uses both the principles of psychometric scaling and the rule base building method based on rough sets. In this section, first the main principles of the perceptual coding algorithm will be presented and then the proposed soft computing method for its optimization will be described [53].

Principles of the Perceptual Method for Noise Reduction

The idea behind the perceptual method for noise reduction lies mainly in the constant analysis of the signal-to-noise relationships between consecutive sample packets, the calculation of optimum masking curve shapes, and the processing of noisy audio using the perceptual coding algorithm. The noise sample is always taken from a silent passage that is nearest to the currently processed signal fragment [53]. On the basis of the spectral power density of noise in neighboring "silent passages" andin the current signal packet, the masking threshold is raised separately in each critical band in order to keep the noise below the masking curves. The tuning of parameters controlling this process was performed using the rough set method. Since subjective assessment results are decisive to the success of audio coding, the perceptually encoded patterns should be assessed by experts. Their subjective opinions are expressed in grades, reflecting the degree of subjective preference for individual parameter settings and for the whole pattern. The set of data obtained this way is not consistent, and is ruled by some hidden relationships that have to be discovered before the perceptual coding algorithm is

INIELLIGENT PROCESSING OF TEST RESULTS 199

properly tuned. That is why the rough set method was applied: it provides a tool for making hidden relationships explicit and easy to interpret [53].

Perceptual Model

The determination of the absolute threshold of hearing is performed on the basis of the following empirical equation [225):

Tq = 3.64 · f-o.s -6.5 · exp[ -0.6(/- 3.3)2 ] + 103 • f 4 (6.47)

where: Tq -level ofhearingthresho1d [dB], f - frequency [kHz].

The linear frequency scale may be transformed into the critical band-related Bark scale on the basis ofthe dependencies found by Zwicker [225]:

b = 13·arctg(0.76· f) +3.5·arctg[(f 17.5)2 ] (6.48)

where: b - frequency [Bark], f - frequency [kHz].

The approximation of the masking phenomenon caused by the excitation component on the basilar membrane (in the inner ear) may be approximated by two segments inclined at angles SI and S2 , as is illustrated in Fig. 6.1 0.

l(z)

a.f) { 11:•)

Fig. 6.10. Approximation of the masking thresho1d. Denotations: E(i) -

excitation appearing in the ith critical band; O(i) - the distance between the excitation level and the masking level; T(i) - masking threshold in the ith critical band

Confirmed by many experiments, the triangular approximation is sufficiently accurate, provided the masking thresho1d T(i) is accurately determined and the slopes of the triangle are properly inclined. The inclination angle measures SI and S2 can be determined on the empirical dependence (given by Schroeder):

200

where: S1 , S2 - inclination measures expressed .in [ dB/Bark],

i - critical band number, i = 0,1, ... ,24,

fc (i) - center frequency of ith critical band [kHz],

E(i) -level of excitation signal [dB]

CHAPTER6

(6.49)

According to Johnston [84] the distance between the excitation Ievel and the masking Ievel O(i) can be determined on the basis ofthe following relationship:

O(i)=a(14.5 + i) + (1- a) · 5.5 (6.50)

where: a (0 ~ a ~ 1) - is the so called tona/ity index which may be computed as follows on the basis ofthe N-point Fast Fourier Transform Algorithm [84].

In order to hide noise affecting the useful signal, the masking thresholds T(i) should be raised properly. This is achieved numerically through setting appropriate values to them, and can be done when an additional variable ß(i) is added to formula (6.50) as follows:

O(i)=a(l4.5 +i) +(1-a) · 5.5 +an ß(i) (6.51)

where: an - the tonality index for a sample of noise taken from a silence passage nearest to the currently processed signal fragment.

Since masking is not a local phenomenon (it influences a certain bandwidth}, the Ievel of masking Ex,k<Ex,bxA) for the frequency bk [Bark] caused by the

presence ofthe component ofthe frequency bx [Bark] should be determined. After some additional calculations which are not presented here due to space limitations, the final relationship is derived, allowing one to define masking threshold as follows:

T (i)=l ologiO E' (i)-O(i)/10 (6.52)

where:

E' (i)=By(i,j) · S p(i) (6.53)

and: E' (i) - integrated excitation in the ith critical band

INTELLIGENT PROCESSJNG OF TEST RESUL TS 201

Bif(i,j)- spreadingfunction for the distance between critical bands llb=i- j,

(this function can be given in the fonn of a look-up table) Sp(i) - spectral power density in the ith critical band

Another phenomenon which should be taken into consideration is postmasking, or the influence of previously occurring excitements on the current masking effect. Post-masking can be modeled by the dependence:

(6.54)

where: St(i,k),St(i,k -1) - spectral power densities in the ith critical band for the kth and the (k-l)th sample packets, S0 (i,k) - spectral power density in the ith critical band after the transfonnation of the signal from the external to the inner ear, given by the equation:

ogt Sa(i,k)=a0 (bi) · LSp(ei0 ,k) (6.55)

O=Od;

where: a 0 ( b i ) - coefficients of transfonnation,

Sp(ei0 ,k) - spectral power density for the kth sample packet

The energy transmission coefficient T .(i) used in the equation (6.54) is given by the relationship:

-d

Tr (i)=e r(i)

where: d -time lagbetween consecutive sample packets [ms], T ( i) - time constant [ ms ], dependent on the ith critical band.

(6.56)

Post-masking features can be supported by an algorithm, provided S P (i) in Eq.

(6.53) is replaced by St(i,k) - calculated on the basis ofEq. (6.54).

Tuning the Model

As a result of the previous section, some values have to be computed for each signalpacket in order to determine the masking effect. These include sl (i), s2 (i), O(i), E(i) and T(i), among others. In order to calculate these values within a given critical band, three parameters should be defined for each critical band i

202 CHAPTER6

(i= 1,2, ... ,24), namely: fJ(i), 1(i) and a0 (i). This results in a set containing 72

(3·24) parameterstobe tuned. Proper selection of these parameter values allows one to control the process of noise removal. Even if the parameter values are not optimized separately in all 24 critical bands, it would be impossible to perform this kind of optimization without the support of a specialized, computer -assisted procedure. Practically, the 24 critical bands are grouped into three regions: low frequency (bands 1 to 8), mid-frequency (bands 9-16) and high frequency (bands 17-24). The three parameterstobe optimized, fJ(i), t(i) and a0 (i), were set

identically in each group of critical bands. Consequently, nine (3·3) parameters were subjected to the optimization procedure - supported by soft computing data processing- which is to be further examined.

Subjective Testing Procedure and Soft Processing of Resuns

As was already mentioned in Section 3.3.3, the most popular subjective quality testing method is the paired comparison test. The goal of this method is to compare objects ordered into pairs and to assess them on the basis of a two-level (better/worse) attribute scale. Specifically, signal samples are presented in A-B order. The expert task is to choose the better one from a pair of sound samples that differ in acoustic features. As a result of the selection, a certain nurober is assigned to each compared sound sample. This nurober reflects the experts' overall preference. Statistical analysis of the test results cannot reveal hidden relationships between the tested parameters, nor can it give rules for tuning a system that is based on such parameters. That is why the soft computing mle-based system (rough set method) was employed to this task. There are many other effective machine learning algoritluns, however, in the studiedcaseit is necessary to fulfill some special demands in order to make the acquired knowledge base applicable to the task oftuning the audio processing algoritlun. The demands are as follows: - The knowledge should be presented in the form of a set of readable rules; - The rules should be associated with a belief measure which allows them to be

ranked, be~use it is not probable that only certain mies will be induced on the basis of the subjective opinions of many experts (a wide margin of uncertainty is expected);

- The system should be able to deal with values expressed by ranges.

The above considerations led to the selection of the rough set decision system as a tool which meets the demands related to processing the acquired data on the basis of subjective opinions. Let the nurober of assessed audio pattems (related to a single tuned parameter of the algorithm) be set to X, the nurober of subjects involved in the subjective listening session be equal to Y, and finally, the nurober of test series be equal to Z. From this, the maximuro nurober of answers in a paired comparison test equals:

(6.57)


where N1 is the number of pairs assessed by an individual expert in one series of a test calculated according to the following fonnula:

N - - . (X) XI 1 - 2 -(X -2)!·2!

(6.58)

In this way, ten audio patterns tested by five experts in two series will result in 450 votes distributed between the objects. The test was perfonned nine times because there were nine parameters {ß(1), ß(2), ß(3), 1(1), 1(2), 1(3), a0(1), a0(2), a0(3)} which were tested separately for three frequency regions. The preference diagrams were obtained on the basis of experts' answers (see the example in Fig. 6.11). The individual objects (A to J) were assessed when paired with a11 others. Because each object was rank.ed higher certain times by some experts when compared to other objects, the previously mentioned 450 votes are distributed among objects, as may be seen in Fig. 6.11. The preference curve reveals some maxima for the objects processed with certain parameter settings. Thus, the plot indirectly provides information on perceptible differences between parameter values. In this way, the so-called Fuzzy Perceptual Quantization Method (FPQM) can be introduced. Usually, for practical needs, the number of preference ranges could be diminished, so the number of votes scale reflecting this preference could, for example, be decimated. Hence, in the shown example (Fig. 6.11), the vertical axis can be re-scaled to reflect preferences in the range from 0 to 5.

50

40

30

20

10

0

No of v~tes 1--+- test No. 1 I

~ / ~ 1-- test No. 1bls I ~ ~

~ ~

ABC D E F GH

Objtct

/. ~

Fig. 6.11. Number ofvotes given to each tested object when assessed in pairs, reflecting the degree of subjective preference of objects by experts (the test procedure was repeated twice). A-J are audio patterns related to 10 values of one parameter (a0(i) is represented in this figure).

Once the parameter values are discretized into five ranges, it is possible to generate 59=1953125 audio patterns related to all combinations of the values of the nine parameters. Obviously, such a number of tests is completely impractical. Thus, the number of combinations was randomly decreased to 4096 (a data compression ratio of about 500). This operation makes the subjective testing realizable, but simultaneously imposes a wide margin of uncertainty on the data processing algorithm. A special computer program was prepared to generate and

204 CHAPTER6

play patterns automatically without the need to store the resulting audio files. Grades were acquired from the experts using special keyboards interfaced to the computer. Each audio pattem contained 5 seconds of mixed music and speech. This resulted in 4096 • 5 [s] = 6 [h] of music assessed by the experts (plus three hours for pauses between fragments). The sessions were organized over a period of six days in order to prevent the experts from getting tired. After listening to each pattern, each of the experts proposed a grade for overall preference according to a 5-point preference scale. The experts were asked to simultaneously assess the noise reduction effect and the audible distortion level. Since five experts were employed, a database containing 5·4096 = 20480 records was created. Because the number of test results is so large, the relationships between parameters remain hidden until they are discovered by an automatic rule induction algorithm. The large portion of missing combinations, tagether with inconsistencies in the database, leaves a wide margin of uncertainty to be managed by the soft computing algorithm.

Rough Set-Based Analysis of Test Results

In order to discover the tendencies underlying the overall quality choices assigned by the experts to particular combinations of parameter values, a rough set-based analysiswas performed. According to Tab. 2.1, the Quality is defined as the decision attribute (D), with all other attributes included in table being used as condition attributes (Ak). These condition attributes represent the perceptual

algorithm parameters which were previously introduced. The aki values in the

table are filled in after assigning grades to real parameter values on the basis of the previously executed Fuzzy Perceptual Quantization Method (FPQM). A table obtained in such a way is highly inconsistent, mainly because different combinations of parameters result in the same overall subjective preference.

The number of subjects involved in the parametric test is equal to Y, and therefore the total number of experts' tables equals Y. The number of rows in each table equals n. The number of parameters is p. Consequently, after summing up the results provided by all experts, a database containing Y · n · (p + 1) records is

created in tabular form (the number of attributes is represented by (p + 1), including the decision attribute). In the beginning, a (redundant) set of rules is obtained from this database of the form given below:

(attribute _At) =(value _ a,J and ( attribute _ Ak+J =(value _ a(k+1JJ .. .... and .... and(value_A"J = (value_amJ => (Overall_Qua/ity D;=d) (6.59)

where: Ak - perceptual parameter to be optimized; Droverall subjective grade D={l,2,3,4,5}, k= 1,2, ... ,m; i=lv2v3v4v5; j=lv 2v, ... ,v5

After deleting the duplicated rows (superfluous data elimination) and finding the reducts, reduced sets of rules are obtained which contain the knowledge of the

IN1ELLIGENT PROCESSING OF lEST RESUL TS 205

masking algorithm parameter values which are preferred by the experts. The final step is to generate rules from the reducts. However, in the specific case related to this field of application, the whole set of rules should then be used as an initial guide for tuning the perceptual coding system. This is caused by the fact that the reduced rules may not show how to set the values of some parameters. Since some rules contain only some of the parameters to be set, the sum of the rules generated from the reducts should be taken into consideration. The practical way of tuning the system using the set of rules is described further on. \\\\

Experimental Procedures

Ten values of each parameter within each range were selected for testing. In this way, a total of 90 parameter values were defined in each of the three frequency ranges. Then, ten speech and music pattems ( each of them approximately 5 seconds in duration) were processed using all ten of the previously defined perceptual coding algorithm parameter values. Consequently, after processing, nine sets of ten objects (audio pattems) were obtained. Subsequently, each ten-element set of objects related to a defined parameterwas

transformed into a set of ( ~ 0) = 45 pairs to be assessed subjectively. Then, the

experts provided their opinions and the number of votes which they gave to each object was computed. The subjective grade scale was decimated, so the fozzy perceptual quantization (FPQM) of the parameter values was obtained. After the listening tests, the decision table was built up according to the previously discussed scheme (see Tab. 2.1). The rough set algorithm elaborated at the Sound Engineering Department of the Technical University of Gdansk was used [48]. This algorithm generated rules of various rough measure values from the table (within the range <0.5 to 1.0>). After the calculation of reducts, a new rule setwas generated (based on the reducts) which contained the 36 strongest (ußi>0.8) rules with length from 2 to 9. The rules were ordered in such a way that the shorter and the stronger ones were listed frrst. The strongest rules were then used during the experiments as a guide for tuning the perceptual noise reduction system. The exemplary rules obtained in this way were as follows:

(ß(2)=3) and (ß(3)=1) => (Overa/l_Quality =4), ,URS=1 (ß(3)=0) and ( 1(2)=4) => (Overall_Quality =1), ,uRS=0.816 (ß(1)=4) and (ß(2)=3) and (ß(3)=1) and ( 1(2)=3) and ( 1(3)=2) => (Overali_Qua/ity =5), ,uRS=1 (ß(1)=1) and (ß(2)=1) and ( 1(3)=1) and (a0(3)=0) => (Overali_Quality =2), ,URS=0.803

(ß(1)=4) and (ß(2)=3) and (ß(3)=1) and ( 1(3)=1) and (a0(2)=1) and (ao(3)=3) => (Overall_Quality =5), ,uRS=0.911 (ß(1)=4) and (ß(2)=3) and (ß(3)=1) and (1(1)=2) and (a0(1)=2) and

206 CHAP1ER6

(ao(2)=1) and (a0(3)=3) => (Overall_Quality =5), ,URS=0.803

No one rule which ernploys all nine conditional attributes was found to be associated with the decision showing the highest grade of subjective preference (Overall_Quality =5). Consequently, usingthe induced set ofrules, the pararneters of the perceptual coding system were set in such a way that the shortest and strongest rules were considered first. After some additionallistening, the frrst rule showing the bestgrade (=5) was applied toset ß(l) to 4 and /1(2) to 3 and /1(3) to I and 1(2) to 3 and 1(3) to 2. The rernaining pararneter values ( 1(1), a0(l), a0(2) and ao(3)) were set on the basis ofthe two last rules listed above, narnely 1(1) to 2 and ao(l) to 2 and a0(2) to I and a0(3) to 3. A finallistening test showed that these settings were acceptable by the experts.

As rnay be seen frorn the above analysis, the Fuzzy Perceptual Quantization (FPQM) concept, providing a usable method for replacing values by ranges in psychometric scaling, has been introduced in this paper. Despite the data reduction obtained by using perceptual quantization, the number of combinations of parameter settings was still too large for practicallistening tests. The reduction of the number of such combinations made the testing procedure uncertain and the table of subjective preference was highly incomplete. Under such conditions, the statistical processing of data cannot help to find optimum values for the tested parameters. A rough set algorithm dealing with inconsistency and missing data allowed the rules to be generated which were then applied to tuning the perceptual noise reduction system.

7. CONTROL APPLICA TIONS

Computerizing classical pipe organs opens a new domain of interests, in which modern technology meets the traditional way of playing such instruments. The application of a microprocessor system to an organ may significantly improve many of the control and display functions of the console. Computer control of the pipe organ also enables a new approach to the problern of existing musical articulation limitations in a pipe instrument with an electromagnetic action. This kind of pipe organ control is characterized by the promptness in the pipe response, as the air flow cannot be controlled otherwise than by the rapid opening and equally fast closing of the valve. In the opinion of organists, this deprives them of the possibility to interpret music according to their wishes.

The di:fferences between particular organ instruments in terms of control system design directly affect their performance evaluation as expressed by musicians and listeners. This type of research was carried out during the 1950's and 1960's. However, the research methods applied then were scarcely precise. New interest in the analysis of pipe organ sound derives from the fact that many physical features formed during instrument production are not sufficiently recognized to form a basis for the design of contemporary pipe organ instruments. The theoretical studies of organ pipes, both the classical ones of Lord Rayleigh and Brown and later those of Powell [170], Benade [18] and Coltman [42][43], do not present a uniform opinion on the dynamic behavior of wind instruments. That is why a fully adequate pipe simulation model was not as yet elaborated. Various techniques have been applied in order to extract parameters relevant to the physics of an organ pipe. Some of these date from at least the early 1950's, with papers by Richardson and Rakowski and Richardson [ 177] being among the earliest references. Another approach based on the work of Caddy and Pollard [31] considers only time domain analysis of attack transients for a tested pipe organ. Early work of Fleteher [65] was based on calculations of both steady state oscillations of the organ pipe and overblown regimes in nonlinear interaction with excitation. As pointed out by Keeler [91], the evidence of overblown regimes in initial transients cannot be neglected. During the build-up of the sound, the sound pressure is changing significantly. An efficient method of modeling the pipe sound, both analytically and numerically, was originally described by Fleteher [66] and has been used for this purpose by Schumacher [186], Nolle and Finch [158] and re-examined by Kostek and Czyzewski [100]. Schumacher based his study on Fletcher's model in terms of a nonlinear equation of the Hammerstein type and obtained the oscillating waveform to an arbitrary harmonic number. Nolle and Finch described both experimental study and numerical Simulations of flue pipe starting transients, based on the elaborated model. They reported the

208 CHAP1ER7

relation between the speed of pressing the key and the overshoot which appears in the sound attack. in argans having mechanical track.er action. They observed the percussive character of an attack. in two cases, namely when the burst occurs (a second or third hannonic dominates the starting transient) or when the energy of higher hannonics is bigger than that of the fundamental. Nolle and Finch observed these phenomena in their experimental tests. They also apply test results to the numerical simulations, and as a result some initial parameters were based on Observationsmade during their experiments [158]. Rakowski and Richardson and also Lottermoser mentioned another effect, which may be referred to as the precursor, preceding the attack. itself. However, this happens when the air is just starting to be admitted into the pipe, thus from the musical viewpoint the resulted sound does not last long enough to have discernible pitch [145][177].

Reviewing the most common organ action types, one can find significant mechanism features which cause differences in the sound produced. Among the organ actions, mechanical, pneumatic, electrical and mixed ones may be identified [202]. The firsttype provides the sound most preferred both by organists and by listeners. The pneumatic or electropneumatic control of the organ are only found in argans built previously. On the other hand, the electrical action is usually chosen by modern organ builders because of the possibility of separating the control system from the rank of pipes. Time delay caused by the operation of the organ mechanism might be one of the criteria of the quality of an organ action [121][165) and the resulting sound Up to some value, time delays in the operation of an instrument may be tolerable, though a synchronaus response to the performance of the organist is desirable. In the case of a mechanical action, delays caused by this system are mainly that of opening the pallet in order to build up the pressure in a key chamber. The direct electrical control due to the lack. of intermediate air or wood passages does not affect the time of opening the pallet. Nevertheless, the initial time delay is only one of the factors characteristic of the differences in a pipe sound response. It is possible to determine another factor associated with the articulation features having an influence on the quality of the pipe sound, namely the way in which the transient attack is building up (91][121]. The measurement procedures which allow the extraction of parameters for the above mentionoo investigations will be shortly reviewed in the next section.

7.1. Articulation-Related Features in the Pipe Organ Sound

7.1.1. Time Delays

For the purpose of investigating and comparing various tone qualities of the organ sound, a measurement program was established [121], however here only the main principles of this program will be descnbed. lt should be remernbered

CONTROL APPLICA TIONS 209

that the change of velocity of opening the mechanical pallet is the result of the force impressed upon the key. Consequently, the velocity of the key motion has been selected as a relevant parameter associated with the articulation features in organ music. That choice was also confinned in previous tests [121]. Subsequently, it was noticed that the velocity of the key motion can be replaced by a quantity being more convenient to measure, namely by the time of key depression from its upper to its lower position, the latter being the state of full depression.

Resulting from the above assumptions, the data for analyses were acquired through the simultaneous recording of the time of key depression and the resulting sound produced by the pipe. The lay-out of such a measurement method is presented in Fig. 7.1.

PIPE ORGAN INSTRUMENT

MICROPHONE

Fig. 7 .1. Lay-out of the signal recording system

R-DAT

Sounds were recorded in the St. Niebolas Basilica in Gdansk, having a mechanically controlled organ, and in the Oliva Cathedral, where the organ is controlled with both electropneumatic and direct electrical pallet control. A microphone was placed close to the selected pipe, in the direct field, in order to Iimit the influence of church acoustics. The position of the key motion was registered through a pair of piezoelectric film sensors installed at the depressed key (Fig. 7.2a,b) which corresponded to the pipe being recorded [121]. Special construction of the sensors allows for their activation only when the plate is moving in one direction, which in this case is from the upper position to the lower one. Therefore, the sensor placed above the key (Fig. 7.2a) generates an impulse in the initial phase of pressing the key, whereas the sensor placed under the key (Fig. 7.2b) does it in the fmal phase. The time between the activated impulses can be taken as approximately the time of pressing the key. The sensors are constructed with the use ofpiezoelectric foils adhered with thin band layers [164]. Electrodes are applied to the polarized film. The sensors have a built in electronic

210 CHAP1ER 7

commutator which enables impulse generation at the moment of pressing the piezoelectric film. The structure of electrical connections in this commutator is shown in Fig. 7.2c. The assigned letters, namely: A, B, C, D, Ein the schema, correspond to points where electric current was measured. Exemplary current characteristics are shown in Fig. 7.2d. During the recording, impulses from the sensors were recorded on one ofthe R-DAT recorder channels which later allowed for identifying the time of pressing the key.

a. b.

c. d.

A ut I t •

B ut 1\__ t

C ut V •

I t •

·:·:· D ut 1\__ I

ut K 't V E 1\.. t

V v= Fig. 7.2. Registering the time of pressing the key with the use of piezoelectric

sensors: the key in the resting position (a), the key pressed (b ), block diagram of the electronic commutator (c), exemplary current characteristics measured in the assigned measurement points- see Fig. 7.2c (d)

Organs were tested in two different ways, with fast and slow depression of the key according to the demands of musical articulation. The recorded material was subjected to a detailed analysis in order to study the relation between the velocity of the key motion and the resulting articulation features. This made it possible to find the values of the time interval corresponding to the duration of the key motion. This kind of analysis is shown in Fig. 7.3. Values of M in Fig. 7.3 correspond to key depression times.

CON1ROL APPLICA TIONS 211

Fig. 7.3. Recording of an organ pipe sound along with the impulses from sensors (ll! corresponds to time of depressing the key), fast (a) and slow (b) key depression

Additionally, the results of the analyses of key motion are shown in Table 7 .I. The values of time intervals for different key velocities in Tab. 7 .I are divided into three ranges in the case of the mechanical action. Consequently, as is seen from Tab. 7.1, the initial delay corresponds very closely to the values quoted in the Iiterature [91 ][ 121] and the mechanism Operation of the wood tracker connections may be neglected, both for a slow and fast depressing of the key. On the other hand, there is practically no correlation between the corresponding values of the way a key is depressed when perfonning on an organ using either an electropneumatic or an electrical action. In the case of the electropneumatic system, the air passages and operation of pneumatic motors cause such delays that the time of key movement may be neglected.

Tab. 7.1. Initial time delay caused by key motion and operation ofthe control system

Type of pipe organ action Initial time delay [ ms] mechanical: slow key depression 50-90 moderate key depression 30-50 fast key depression 15-30 e1ectropneumatic: slow & fast key depression 100-150 electrical: s1ow & fast key depression 10-15

The increased initial delay value in comparison to the other systems is in good agreement with the results obtained by Pollard [165). In the electrical system, depressing a key evokes an impulse to open a paltet without any delay. It has been

212 CHAP1ER7

noted that the time required to depress a key from its upper position to the lower one in the case of a fast and slow touch is between 10 to 15 ms. This type of organ control is characterized by the outstanding promptness in the pipe response.

7 .1.2. Attack Transient of Pipe Sound

The articulation features of the pipe sound are mostly related to attack. transient building. Therefore, they can be described by initial delay, transient duration and by growth of subsequent harmonics. Fig. 7.4 illustrates the signal envelopes of the note a (110Hz) ofthe Principal8' in the cases of quickly and slowly depressing the key. Comparison of the two characteristics clearly shows the differences occurring during the stage of growth of the sound. Thus, it is expected that the differences in musical articulation are determined by the attack. transient. Subsequent listening to the extracted transients shows that the pitch of the initial transient is an octave above the pitch of the fundamental.

The spectral analysis of transients is plotted in Fig. 7.5. The attack transient is dominated by the first two harmonics. A delay with respect to the way the key is depressed is visible in the characteristics. As is seen from Fig. 7.5a, there is an initial rise in the second harmonic in the case of fast key depression, and the delay of the signal transient differs in character, being Ionger for the fundamental. It was noted that during the period of the overblown regime, the produced sound is an octave higher than the nonnal pitch. In the case of a slow attack. (Fig. 7.5b}, the rise of the transient is also slow and the fundamental builds up very smoothly, rising more quickly and at the same time more strongly than the other components.

To verify the above stated Observations, sound simulations were performed based on the physical model of pipe organs. One of the methods belonging to the category of physical modeling is "digital waveguide" synthesis [45][52)[193][206)[207]. In this method, the wave equation is first solved in a generat way to obtain the traveling waves in the instnunent body that are then simulated in the waveguide model. What is important, the initial stage of sound rise, critical to the subjective assessment of the naturalness of the organ sound produced by pipes, was modeled with results showing that sounds produced in models in which the air flow conditions were differentiated better resemble the natural sound [121].

Another approach to physical modeling is focused on the mathematical model of the behavior of the pipe when treated as a system. A detailed analytic description and modeling of transients in the speech of organ flue pipes has been treated by Fleteher [66)[67] and reviewed intensively in the literature·

Several models of organ flue pipe have been developed during recent years (e.g. [42][43)[52][62)[152][209)[210]). However, the existing general concept of the pipe system assumes at least two sub-systems and the interaction between them. The first is a nearly linear resonant system of the pipe and the second is a nonlinear system of blowing wind, with the assumption of its interaction with the pipe. Also, a frequency-dependent delay element, representing the time for jet

CONTROL APPLICATIONS 213

deflection waves to transverse the height of the pipe mouth, is to be included in thismodel.

a.

b. orgslaw.snd - ""t'AROVSOUH03

Fig. 7.4. Signal envelopes, note a (110 Hz), 8' Principal of the organ of St. Nicholas Basilica in the case offast (a) and slow (b) key depression

a.

r11 ... - • J•rtt ,.., r:...w--... t. , , • .._ ____ 11t. f41 )l( ••• _ .... _

............... _ ............... ·.···-·.·-·-·-·.·-·.·.·····················-··-·-·-·.·.·.•.•.•.•.·.·-····-·· ·-·-·-··.·.·.·-·-·-·-·-·-·-·-·-·-·-·-·.·.';.·.·-·-·.·.·-·.·.·.·.·.·.-.·---·.·-.-.-.·.-


214 CHAPlER 7

IlA•! 41 OOUOt W t . tUUI lhc-.u,l.l'' 1 """ I 'll•1 •/trat ''" ~.w-.at. '~"• ... ••• Ut OC lCI ..... .tc ; ......... "" .................................... ..

Fig. 7.5. Spectral analysis of the first six hannonics of the note a (110 Hz), 8' Principal of the organ of St. Nicho1as Basilica in the case of fast (a) and slow (b) key depression

By relating the air pressure flow to different ways of opening the pallet, it is possible to express the articulation aspect according to the following equation:

P(t) = P0 + (.R_ - P0 )exp( -t Ir) (7.1)

where 1) specifies the pressure peak, P0 the steady pressure and 1 the decay

time from the peak Ievel. It is possib1e to discuss at least two cases of the relationship between P1 and P2. When 1) » P0 , the pressure peak is occurring

and may be referred to as fast attack. When 1) « P0 , then the transient is slow. A

more detailed description of the features related to sound rise in flue pipes can be found in the Iiterature [121].

The numerical model based on the above stated principles was imp1emented on the UNIX workstation using the MA TIIEMA TICA system and was verified by multiple listening examinations of the model output sounds obtained from realtime simulation of the pipe speech response [ 121].

7 .2. Fuzzy Control of Pipe Organ

7.2.1. General Characteristics of Pipe Valve

In order to control the articulation features in a pipe organ, it is necessary to relate the way of opening the pipe valves to the way of depressing keys. This problern may be so1ved by introducing a stepping valve which reacts to the

CONTR.OL APPLICA TIONS 215

velocity of movement of the key. A suggested model ofthat kind of valve is presented in Fig. 7.6.

Fig. 7.6. Digitally controlled electromagnetic pipe valve: 1 - pipe foot air duct cross-section, 2 -attenuation diaphragm, 3 - counteracting

spring, 4 -electromagnet, 5 -moving magnet coil, 6 -coil, 7 - electromagnet driver, 8- inputs ofbinary control commands

An equivalent circuit of such an electromechanical system is presented in Fig. 7.7. It may be mathematically described by the Lagrange function of state variables. Assuming both friction and damping forces caused by the operation of the given system, the system dynamics is described by the equation of state [ 173]:

(7.2)

where:

q - state vector of the system,

F- the so-called Rayleigh function defined as:

8F -. =fo (7.3) o qr

where: fo - damping forces.

In the case ofthe circuit presented in Fig. 7.7, the Lagrange function equals the difference between the kinetic and potential energy, calculated independently for the mechanical and electrical subsystems [173]. The above assumptions and the condition that potential energy U 0 equals 0 for the electrical subsystem result in a set of differential equations describing the dynamic performance of the system.

216 CHAP1ER7

The first equation (7.4) describes the motion of the moving core, and the second (7.5) is related to the current induced in the windings:

M"i(t)+Bl(t)+K[l(t)-10 ]+ A 2 i 2 (t)=O (7.4) 2[d0 +l(t)]

A i(t) + Ri(T)- A 2 i(t) i(t) = u(t) (7.5) d0 +l(t) [d0 +1(t)]

where:

M- mass of the moving core, B - mechanical resistance, K - elasticity coefficient of the spring, A - coefficient related to the coil wire cross-section dimensions, number of

windings and the magnetic constant, R - coil resistance, L(t)- inductance ofthe circuit:

where:

A L(t)---

d0 +l(t)

I (t) - distance between the moving and permanent magnet,

(7.6)

10 - distance between the moving and permanent magnet for the neutral spring

position, d0 - thickness ofthe antimagnetic separator,

• i(t) = q(t) - current induced in the windings, u(t) - voltage in the windings.

On the basis of the system state equations, it is possible to examine the system performance in the time domain using Laplace transforms or the complex plane. Taking into account the static behavior of the device under consideration, the characteristics U = f ( L) , relating a given value of the voltage U in the steady-

state to a certain position of the movable core (L), becomes:

2K 2 U = ±R -(10 - L)(d0 + L)

A (7.7)


Assuming /0 < do for 0 < L < /0 the characteristic U = f(L) may have the 2

form shown in Fig. 7.8a. On the other band, when /0 > do , then the expression 2

U=f(L) has its maximum, therefore the characteristics is indeterminate (Fig. 7.8b).

R

u(t) l(t) \

Fig. 7.7. Equivalent model ofthe system shown in Fig. 7.6:

R - coil resistance, u(t), i(t) - voltage and current in the windings, d0 -

thickness of the antimagnetic separator, l(t) - distance between the moving and permanent magnet, K - elasticity coefficient of the spring

Fig. 7.8. Shapes of static characteristics ofthe system presented in Fig. 7.7, do do a. when /0 < - for 0 < L <10 b. /0 > -2 2

The mathematical description of physical artefacts occurring in the presented model result in a fairly complicated form. Although it is possible to devise the control structure of such a model, it is not enough to operate the system based on the adopted formula. However, taking into consideration a simplified description of the system it is possible to derive some practical rules governing the relationship between the speed of diaphragm motion and the supplying current. These principles may be intuitively explained on the basis of Fig. 7.6. Analysis of Fig. 7.6 shows that the electromagnet opening the attenuation diaphragm must

218 CHAP1ER 7

overcome the resistance of the counteracting spring. This may occur, provided the electromagnet coil is fed with sufficient electrical power. Providing that the power is Jimited, the valve remains in a partially-opened position. Consequently, the value of the current in the coil circuit is decisive to the position of the air flow diaphragm. Thus, the dynamic which governs the regulation of the coil current may allow for the control of valve diaphragm motion according to the way of the key is depressed.

Obviously, digital control of an organ imposes the discretization of the key velocity parameter and, consequently, the coil current value. The system should generate signals that for a given control structure, will set the system into the desired state within a minimum time and with minimum energy consumption. That problern is directly related to the organization of data transmission from the console to the organwind chests and may be easily solved [117][121]. On the contrary, the influence of the number of discretization Ievels on the cost of the electromagnetic valve cannot be neglected. Since several hundreds of pipe electromagnets are used in typical organs, the application of digital-to-analog converters in the coil drivers cannot be considered because this kind of electrical drive would be impractical and costly. However, as it will be shown in the next paragraph, by using fuzzy control technology this problern can be solved differently andin a less expensive way.

7 .2.2. System Description

The whole process consisting of depressing the key, the reaction of the valve and the resulting build-up of the sound is difficult to be descnoed mathematically. Such a description might form a basis for building a microprocessor control system ofthe organ, as was described in the previous paragraph. However, taking into account that these processes are imprecise in nature, a typical microprocessor system for an organ may be replaced by a leaming control system capable of modeling nonlinearities. Such modeling could be learned by the system from exemplary entries and related decisions. Consequently, fuzzy logic techniques may be employed in such a control system.

For the purpose of this research, a model of a pipe organwas designed and constructed [121). It consists oftwo elements: a model ofan organ tracker action and a control system based on fuzzy logic technique (Fig. 7.9). The model ofthe organ was made from oak, and consists of: bellows with a volume of 0.06m3, covered with leather (the bellows are filled with air through a foot pedal); ~ ehest sized 0.4mx0.3mx0.2m; two organ pipes (Principal 8' - tin pipe, and Bourdon 8' -wooden pipe); and a tracker action which enables both mechanical control and electrical activation. Three electromagnets used in this control system are combined electrically to one key. The valve is driven by electromagnets with counteracting spring. Electric activation is obtained through the use of a set of electromagnets controlled by a system constructed on the basis of fuzzy logic. Activating the electromagnets causes the air inflow to a selected pipe. A block diagram of the system which controls the electromagnets of the organ pipe valves is shown in Fig.

CON1ROL APPLICA TIONS 219

7.10. Additionally, the system configuration is shown in Fig. 7.11. The following components are included: a dynamic keyboard, sensitive to the velocity of key motion, connected through a MIDIinterface to the computer; a PC computer with software operating the FUZZV microprocessor card [2][3]; FUZZV microprocessor card [2][3] and the MIDI interface card installed in a PC computer; a specially constructed control display of key nurober and key velocity; a buffer of information exchange between the MIDI and ruzzy cards; and a buffer to control the electromagnets via the transistor drivers (Fig. 7.11). The applied Yamaha PSR-1500 MIDI keyboard is of a touch-sensitive type, therefore according to the velocity with which the key was pressed a MIDIcode is generated. A sensor under the keyboard picks up the signal correlated to the way of depressing the key and at the same time transforms it into the system input signal.

Fig. 7.9. Fuzzy logic-based control system for a pipe organ

220

Control

Fig. 7.10. Block diagram ofthe control system

plpe organ model

dynamlc keyboard

electromagnets

CHAPTER 7

computer wtth MIDI& FUZZV

cards

Fig. 7 .11. Lay-out of the fuzzy logic-based control system configuration

The infonnation on pressing or releasing the key is transmitred from the keyboard through the MIDI interface in the form of 2 or 3 bytes of data: - the first command means that data will be transmitted, - the second byte - information on the key number within the range from 0 to

127, - the third byte - information on the velocity of pressing the key, in the range

from 1 to 127. The infonnation related to the key number is essential because of the relation

between the size of the pipe and the articulation artefacts. In traditional, mechanical organs, articulation features appear mostly in low tones. The sound rise in 1arge pipes may be fast or slow, so it is possible to hear the differences in


the articulated sounds. Small pipes, because of their size, are excited by the wind blow very quickly and speak nearly always in the same way.

The above infonnation is decoded by the computer through a MIDI decoding procedure. Obtained values are periodically transmitted to the fuzzy logic control system at the speed of31.25 kBaud. The totaltransmissiontime t (Eq. 7.8) consists of at least three delays, namely:

- !1 - connected to the data transmission from the keyboard to the MIDI card:

t = 20bit = 640 1 31250bit/s J.IS

- t2 - corresponds to the data processing in the MIDI card:

!1 ~ 30J.1S - t3 - needed for the data processing in the FUZZV microprocessor card:

!3 ~ 8J.1S t ~ tl + 12 + !3 ~ 640jlS + 30j.1.S + 8jlS ~ 678jlS (7.8)

As is shown in Fig. 7.11, three parallel connected electromagnets are applied to drive the pallet opening the air inflow. The electromagnets are switched on and driven by the current, the value of which is defined by the fuzzy rule system. Thus, any key motion rates will be translated into the way of opening the valve, and in consequence into the building of air pressure in the pipe that is decisive to the character of the rising sound.

Two parameters that are extracted periodically from the MIDI code, namely the key number and the velocity, create two fuzzy inputs, labeled as [121]:

INPUTS: KEY_NUMBER; VELOCITY, and output is associated with the current applied to electromagnet coils and is

denoted CURRENT. Corresponding memberslrip functions are labeled as follows:

OUTPUT: LOW_ CURRENT; MEDIUM_ CURRENT; HIGH_ CURRENT.

The fuzzifiers were namedas follows:

FUZZIFIERS: for KEY NUMBER and VELOCITY: -LOW

-MEDIUM -HIGH

The output of the system is set at the beginning to the value equals 0. The MIDI code assigns the keys with numbers from a range starting from 0 (when no key is pressed) to 127. The mapping ofthe keyboardwas reflected as KEY_NUMBER, and is presented in Table 7.2. The velocity values are represented as in Table 7.3.

222 CHAP1ER 7

Tab. 7.2. Mapping ofthe keyboard

KEY NUMBER CENTER WIDTH LOW 30 29 MEDIUM 70 25 HIGH 100 27

Tab. 7.3. Velocity mapping

VELOCITY CENTER WIDTH LOW 30 29 MEDIUM 70 15 HIGH 101 26

The above 1isted values (Tab. 7.2 and 7.3) were set experimenta11y. The perfonned experiments allow one to show the plot of membership functions corresponding to the input KEY_ NUMBER and VELOCITY and CURRENT denoted as OUTPUT (Fig. 7.12). As can be seen from Fig. 7.12, triangular membership functions are employed in the fuzzy controller.

The inputs and fuzzifiers are producing tenns that are used in the following rules:

RULES:

if KEY NUMBER is OFF then 0 if VELOCITY is OFF then 0 if KEY NUMBER is LOW

LOW CURRENT if KEY NUMBER is MEDIUM

LOW CURRENT if KEY NUMBER is HIGH

MEDIUM CURRENT

and VELOCITY

and VELOCITY

and VELOCITY

if KEY NUMBER is LOW and VELOCITY is MEDIUM CURRENT

is LOW

is LOW

is LOW

MEDIUM

if KEY NUMBER is MEDIUM and VELOCITY is MEDIUM MEDIUM CURRENT

if KEY NUMBER is HIGH and VELOCITY is MEDIUM HIGH CURRENT

if KEY NUMBER is LOW and VELOCITY is HIGH HIGH CURRENT

if KEY NUMBER is MEDIUM and VELOCITY is HIGH HIGH CURRENT

if KEY NUMBER is HIGH and VELOCITY is HIGH HIGH CURRENT

then

then

then

then

then

then

then

then

then

CONTROL APPLICATIONS 223

I'

30 70 100 ar_NUNBBR

30 70 101 vsu:x:rrr

60 80 100 CURRB!rr

Fig. 7.12. Membership functions corresponding to the VELOCITY (a), KEY_NUMBER (b) inputs and CURRENT denoted as output (c), where: p. -degree of membership

Bach rule produces a nurober which is calculated according to fuzzy logic principles from the cross-section of the input values with the membership functions (see Fig. 7.12). The winning rule is one that has the highest value assigned during the calculations. On the basis of the adopted terms, the numerical values are converted to the respective current which is driving the electromagnets. This means that the lowest output value causes the slowest opening of the valve, while other values appearing on the output, which match other terms, result in a faster opening of the valve.

Recordings of the signals generated by the model were made based on the system whose blockdiagram is presented in Fig. 7.13. A pair of sensors were attached to the key, and are activated electrically. The input of the systemwas controlled through a touch-sensitive keyboard. Impulses from sensors responsible for the time of depressing the key in the model were registered. The value of the velocity of depressing the key was read from the MIDI interface display. The output signal from the control system was recorded on the left channel of the tape recorder, while the sound of the pipe was registered on the right channel.

224 CHAP1ER 7

P~PE ORGAN;

: MOD~EL : ' '

: 0~~~ : : ptpes ~ : ' '

~--------- - --- . l

Fig. 7.13. Block diagram of the recording system of the pipe organ mode1

Examp1es of analyses of the time- and frequency-domain characteristics of the recorded sounds are presented in Fig. 7.14 and 7.15.

a.

b.

Fig. 7.14. Analyses oftime-domain characteristics of sounds of Principal 8' in the case of: fast opening ofthe valve (a), slow opening ofthe valve (b)

CONlROL APPLICA TIONS 225

The plots show the differences that are visible in the time representation of the analyzed sounds, as well as in the representation of waterfall plots, respectively for fast (Fig. 7.14a and 7.14a) and slow (Fig. 7.15b and 7.15b) opening of the valve.

a.

b.

Fig. 7.15. Analyses of frequency-domain characteristics of sounds of Principal 8' in the case of: fast opening ofthe valve (a), slow opening ofthe valve (b)

226 CHAP1ER 7

Both spectral characteristics differ mainly in the behavior of the second harmonic which grows very quickly in the case of pressing the key quickly and slowly in the other case. There are also other discrepancies for the sounds presented. lt is easy to observe that the fundamental is much weaker when depressing the key quickly. The arrows "A" in Fig. 15 show the starting point of the rising of fundamentals, whereas the arrows "B" show the rising of second harmonics. Additionally, in Fig. 7.16 adequate sonogram analyses are illustrated. The difference in starting attacks in fast and slow opening of the valve is clearly visible.

Fig. 7.16. Sonograms of sounds recorded from the model: fast (a) and slow opening of the valve The horizontal axis represents time domain (0-300ms), and the frequency domain is depicted in the vertical axis (0-lOOOHz), the degree of shades translates the magnitude of particular harmonics (white color corresponds to -60dB)

These results show a clear similarity between previously obtained analyses. Therefore, it may be said that the constructed fuzzy logic control system for a pipe organ action responds properly depending on differentiated musical articulation, providing nuances to the musical performance.

8. CONCLUSIONS

The experirnents conducted within the framework of this research work encornpassed the irnplernentation of selected cornputational intelligence rnethods for the purposes of acquiring and recognizing rnusical signals and phrases, and for the application of these rnethods to the verification of subjective acoustical assessrnents. The problems posed were solved through the use of neural networks, fuzzy logic and rough set-based learning algorithms.

The research results obtained during the course of the work confirm the viability of using algorithms frorn the cornputational intelligence area for solving problerns in the areas of rnusical and architectural acoustics. These problerns, due to their cornplexity as well as to the unrepeatable nature of acoustical phenornena, escape analyses that are based on deterministic rnodels. The analyzed problerns included the recognition of rnusical instrument sounds on the basis of their acoustical representation, the recognition of rnusical phrases on the basis of MIDI notation, and non-statistical processing of subjective assessrnent results related to the assessrnent of acoustical quality (of concert halls, low bit-rate cornpression algorithms, artificial reverberation algorithms, and others). Sound and rnusical phrase classification using the previously rnentioned rnethods produced a high percentage of recognition. These results were obtained after optimization of the set of parameters being recognized and optimization of the rnethods of parameter value discretization. High recognition scores were additionally a result of the optimization of the structures and parameter settings of the decision-making systerns.

Among the systerns developed for acoustical analysis, the rough-fuzzy expert systern was found to be the rnost cornplex. This systern was used to autornatically generate acoustical quality assessrnents on the basis of incoming rneasurernent samples. For this purpose, it was necessary to cornbine two rnethods in the engineered systern: the rough set rnethod was used to search for rules in the learning phase on the basis of available expert assessrnent examples, and fuzzy inference was used for estimating particular ranges of values for the rnernbership functions of the parameters used as premises for decision-making rules. As is shown by the results, the expert systern based on these two rnethods can be used to solve the problern of subjective assessrnent objectification. This problern is still considered as the central problern in rnusical and architectural acoustics.

Since the experirnents were in rnost cases newly introduced by the author, it is difficult to evaluate with full objectivity the extent to which these systems are already working optimally and rnaking use of the capabilities of the applied

228 CHAP1ER8

computational methods. One can, however, refer to the results of earlier, similar types of methods, which were not based on the use of expert systems. As a result of this comparison, one can say that methods based on statistical analysis which have been used unti1 now can be successfully supplemented with soft computing methods such as fuzzy inference or leaming algoritluns (in the latter case, through the use of hidden knowledge in connectionist algoritluns, or by using an open form knowledge base such as a set of rules derived from a rough set-based algorithm). This new methodology of analysis in musical and architectural acoustics, though not quite universal and still incomplete, already constitutes an alternative tool within the realm of acoustical analysis. A particular justification for the application of expert systems in this area is provided by the fact that the subjective opinions of experts - here transformed into the knowledge base of an intelligent system - are the final criterion for recognizing sounds and musical phrases and for assessing the acoustical quality of music.

Another problern that was dealt with within the framework of this research work was the application of fuzzy methods to the control of a classical pipe organ. On the basis of the performed experiments, it was possible to propose technical solutions for a new type of organ action An approach such as fuzzy logic shows considerable promise for the control of nonlinear dynamic systems and may result in some practical solutions in the domain of musical instrument control.

The choice of problems presented in this study is intended to emphasize that in some cases even the classical problems of acoustical and architectural acoustics can be addressed and solved by means of new methods, especially those arising from the soft computing domain. Before soft computing methods were introduced, all applications dealing with uncertainty were based on the probabilistic approach. The consequence of this state of matters is the present choice of the methods available to analyze data in acoustics. Meanwhile, in the case of some of the studied applications, such as automatic recognition of musical phrases, it is impossible to base on such an approach only, because each musical phrase has its unique character that cannot be sufficiently described by any statistics. Similarly, the statistical processing of subjective testing results is not fully reliable in most practical applications in which relatively small data sets are available. Moreover, the hitherto used statistical analyses do not allow directly to formulate rules showing the relations between assessed parameters. Such rules are needed to analyze the acoustical phenomena underlying the preference of subjective quality of sound. In the above mentioned applications a rule-based decision systems are needed to ensure a more accurate data analysis and a better understanding of the analyzed phenomena on the basis of data analysis results. Rough set-based systems are generally known for that they can generate rules from data sets and, what is of paramount importance, because it enables handling data with internal inconsistencies. These features of this method proved to be of a high importance to described applications, because subjective assessment results of musical patterns provided by experts are usually highly inconsistent Moreover, the traditional statistical analysis of subjective test results cannot reveal hidden relations between tested parameters nor provide the rules instmcting one as how to

CONCLUSIONS 229

tune a system based on such parameters. Consequently, the rough set method was extensively used which is one of the most advanced and well-developed data analysis techniques available today, offering effective tools to extract knowledge from data. In some applications also the fuzzy logic proved to be applicable to deal with such problems as subjective quantization of parameter ranges, calculating global subjective preference on the basis of such operators as fuzzy union and fuzzy intersection. The fuzzy logic helped also to solve the mentioned earlier problern in the domain of musical instrument control, which has not been solved up to now at all, on the basis of crisp logic.

One of studied problems did not demand the knowledge base in the form of explicit rules. This was the automatic recognition of musical sounds. Because humans recognize the sounds on the basis of non-linear parameters which numerical values are not easy to interpret, the feature of the connectionist approach was exploited which is the effective search for the closest reference vector in the multidimensional space of parameters. However, the rough set rule base system found its application also in this task which was hitherto solved less effectively by others using the probabilistic estimators of parameter distributions. The rough set-based algorithm recognition scores are similar to the results obtained with the feedforward neural network algorithm. The speed of processing of new examples during the recognition phase is similar in both cases and the decision comes after a short delay needed to process the feature vectors derived from a musical pattern representation by the computer. However, the time needed for the training is many times shorter in the case of the rough set rule-based system. The results concerning the time consumption of the training process are not surprising, because the neural network is trained with consecutive examples, while the rough set algorithm simultaneously processes the whole collection of examples. Moreover, the back-propagation algorithm is iterative, while the rough set algorithm scans database to derive rules based on combinations of reduced attributes. The features of this system allow conceiving fully practical applications of the musical sound recognizers, which were not available earlier. These applications can really boost the development of such systems as Internet search engines and multimedia applications which are still lacking the feature of intelligent audio data analysis.

One more problern that was addressed in this work was related to fmding some necessary settings of the perceptual coding algorithm that allowed masking noise affecting speech and musical signals. These experiments are related to current research work conducted at the Sound Engineering Department of the Gdansk Technical University, which is related to new digital signals processing methods applicable to telecommunications channels. The generalization properties of the employed soft computing methods allowed also to obtain acceptable results in the case of tuning algorithms used to solve the task of perceptual noise reduction which is entirely new application ofboth perceptual coding and soft computing.

It should be underlined that soft computing methods are not only valuable in analyses in the domain of musical and architectural acoustics, but they are also far more effective and flexible than the statistical approach used formerly. It seems

230 CHAP1ER8

that future analyses of acoustical data using the soft computing approach would provide a platform for accurate sound and musical pattem recognition and for more universal analysis of acoustical quality.

9. REFERENCES

[1] AES20-1996, Recommended Practicefor Professional Audio- Subfeelive Evaluation ofLoudspeakers, J. Audio Eng. Soc., Vol. 44, No. 5, pp. 386-401, 1996.

[2] AMERICAN NEURALOGIX Inc., NLX 230 Fuzzy Microcontroller Application Note, Sanford, U.S.A., 1992.

[3) AMERICAN NEURALOGIX Inc., FMC Development System - Technical Note, Sanford, U.S.A., 1992.

[4] ANDO Y., Calculation of Subfeelive Preference at Each Seat in the Concert Hall, J. Acoust. Soc. Amer., Vol. 74, No. 3, pp. 873-887, 1983.

[5) ANDO Y., GOTfLOB D., Effects of Early Multiple Rejlections on Subfective Preference Judgments ofMusic Sound Fields, J. Acoust. Soc. Amer., Vol. 65, No. 2, pp. 524-527,1979.

[6) ANDO S., YAMAGUCHI K., Statistical Study of Spectral Parameters in Musical Instrument Tones, J. Acoust Soc. of America, Vol. 94, No. 1, pp. 37-45, 1993.

[7] ANDO Y., SATO S., NAKAJIMA T., SAKURAI M., Acoustic Design of a Concert Hall Applying the Theory of Subfeelive Preference and the Acoustic Measurement after Construction, ActaAcustica, Vol. 83, pp. 635-643, 1997.

[8) BANK M., TAICHER A., KARABELNIK Y., An Obfective Method for Sound Quality Estimation ofCompression Systems, 101st Audio Eng. Soc. Conv., Preprint No. 4373, Los Angeles, 1996.

[9] BAO Z., A Tentative Study ofthe Fuzzy Feature ofthe Sound Quality Perception, 84th Audio Eng. Soc. Conv., Preprint No. 2640, Paris, 1988.

[10) BARRON M, MARSHALL A.H., Spatial Impression Due to Early Lateral Rejlections in Concert Halls: the Derivation ofPhysical Nature, J. Sound Vib., Vol. 77, pp. 211-232, 1981.

[11] BARRON M., LEE. L-J., Energy Relations in Concert Auditoriums. I, J. Acoust. Soc. Amer., Vol. 84, No. 2, pp. 618-628, 1988.

[12) BAZAN J.G., SKOWRON A., SYNAK P., Discovery of Decision Rules from Experimental Data, in Soft Computing (LIN T.Y., WILDBERGER A.M., Eds.), Proc. 3rd Intern. Workshop on Rough Sets and Soft Computing, San Jose, CA, U.S.A., pp. 276-279,1994.

[13] BEATON R.J., WONG P., A Disk-based System for the Subfeelive Assessment of High Quality Audio, 94thAudio Eng. Soc. Conv., Preprint No. 3497, Berlin, 1993.

[14) BEAUCHAMP J.W., Unix Workstation Software for Analysis, Graphics, Modification, and Synthesis of Musical Sounds, 94th Audio Eng. Soc. Conv., Preprint No. 3479, Berlin, 1933.

[15) BEERENDS J.G., S1EMERDINK J.A., Measuring the Quality of Audio Devices, 90th Audio Eng. Soc. Conv., Preprint No. 3070, Paris, 1991.

232

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

CHAP1ER9

BEERENDS J.G., SlEMERDINK J.A., A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation, J. Audio Eng. Society, Vol. 40, No. 12, pp. 963-978, 1992. BEERENDS J.G., SlEMERDINK J.A., A Perceptual Speech-Quality Measure Basedon a Psychoacoustic Sound Representation, J. Audio Eng. Soc., Vol. 42, pp. 115-123, 1994. BENADE A.H, On the Propagation of Sound Waves in a Cylindrical Conduit, J. Acoust. Soc. Amer., Vol. 44, No. 2, pp. 616-623, 1968. BERANEK L.L., Music, Acoustics and Architecture, J. Wiley & Sons, New York, 1962. BERANEK L.L., Concert Hall Acoustics, J. Acoust. Soc. Amer, Vol. 92, No. 1, pp. 1-40, 1992. BEYER R.T., Acoustic, Acoustics, J. Acoust Soc. Amer., Vol. 98, No. 1, pp. 33-34, 1995. BEZDEK J.C, HATIIAWAY R.J., SABIN M.J., TUCKER W.T., Convergence Theory for Fuzzy c-Means Counterexamples and Repairs, IEEE Trans. Syst., Man, Cybem., Vol. SMC-17,No. 5,1987. BIT...LINGS S.A., CHEN S., Extended Model Set, Global Data and Threshold Model ldentification of Severely Non-Linear Systems, Int J. Control, Vol. 50, No. 5, pp. 1897-1923, 1989. BLAUERT J., LINDEMANN W., Auditory Spaciousness: Same Further Psychoacoustic Studies, J. Acoust. Soc. Amer., Vol. 80, No. 5, pp. 553-542, 1986. BLAUERT J., JEKOSCH U., Sound Quality Evaluation- a Multi-Layered Problem, Acustica, Vol. 83, No. 5, pp. 747-753, 1997. BODDEN M., Instrumentationfor Sound Quality Evaluation, Acustica, Vol. 83, No. 5, pp. 775-783, 1997. BOSC P., KACPRZYK J. (Eds.), Fuzziness in Database Management Systems, Physica-Verlag (Springer-Verlag), Heidelberg 1995. BüSE B. K., Expert System, Fuzzy Logic, and Neural Network Applications in Power Electronics andMotion Control, IEEE, Vol. 82, No. 8, pp. 1303-1323, 1994. BRADLEY J.S., Experience with New Auditorium Acoustic Measurements, J. Acoust. Soc. Amer., Vol. 73, No. 6, pp. 2051-2058, 1993. BROWN J.C., Musical Fundamental Frequency Tracking Using a Pattern Recognition Method, J. Acoust. Soc. Amer., Vol. 92, No. 3, pp. 1394-1402, 1992. CADDY S., POLLARD HF., Transient Sounds in Organ Pipes, Acustica, No. 7,

pp. 227-280, 1957. CAMBRIGE P., TODD M., Audio Data Compression Techniques, 94th AES Convention, Preprint No. 3584, Berlin, 1993. CARLSEN J.C., FRICKE J.J., Comparability of Two Measures of Musical Prototypes, TechnicalReport Series No. 8803, Univ. ofWashington, Seattle, 1988. CHAFE C., JAFFE D., Source Separation and Note ldentification in Polyphonie Music, Proc. IEEE-IECEJ'ASJ (Intern. Conf. on Acoustics, Speech, and Signal Proc., pp. 1289-1292, Tokyo, 1986. CHMIELEWSKI M.R., GRZYMALA-BUSSE J.W., et. al., The Rule Induction System LERS- a Versionfor Personal Computers, Foundations of Computing and DecisionSciences, Vol. 18,No. 3-4,pp. 181-212, Poznan, 1993. CHMIELEWSKI M.R., GRZYMALA-BUSSE J.W., Global Discretization of Continuous Attributes as Preprocessing for Machine Leaming, in Soft Computing

REFERENCES 233

(LIN T.Y., WILDDERGER AM., Eds.), Proc. 3rd Intern. Workshop on Rough Sets and Soft Computing, San Jose, CA, U.S.A., pp. 294-301, 1994.

[37] CHOI H., WILLIAMS W., Improved Time-Frequency Representation of MultiComponent Signals using Exponential Kemels, IEEE Trans. ASSP, Vol. 37, pp. 862-871, 1989.

[38] CHRISTENSEN N.S., CHRISTENSEN K.E., WORM H., Classijication of Music Using Neural Net, 92nd Audio Eng. Soc. Conv., Preprint No. 3296, Vienna, 1992.

[39] CHUI CH. K., MONTEFUSCO L., PUCCO L. (Eds.), Wavelets - Theory, Algorithms and Applications, Academic Press, Inc., San Diego, U.S.A., 1994.

[40] COLOMES C., LEVER M., RAULT J.B., DEHERY Y.F., A Perceptual Model Applied to Audio Bit-rate Reduction, J. Audio Eng. Soc., Vol. 43, No. 5, pp. 233-240,1995.

[41] COLOMES C., LEVER M., RAULT J.B., DEHERY Y.F., FAUCON G., A Perceptual Objective Measurement System (POM) for the Quality Assessment of Perceptual Codecs, 96thAES Convention, Preprint No. 3801,Amsterdam, 1994.

[42] COLTMAN J.W., Sounding Mechanism ofthe Flute and Organ Pipe, J. Acoust. Soc. Amer., Vol. 44, No. 4, pp. 983-992, 1968.

[43] COLTMAN J.W., Jet Drive Mechanisms in Edge Tonesand Organ Pipes, J. Acoust. Soc. Amer., Vol. 60, No. 3, pp. 725-733, 1976.

[44] CONDAMINES R., Les criteres physiques de la qualite acoustique des salles, Revue d'Acoustique, No. 26, p. 192-204, 1973.

[45] COOK P.R., A Meta-Wind-Instrument Physical Model, and Meta-Controller for Real Time Peiformance Contra/, Proc. ofthe ICMC, San Jose, CA, U.S.A., 1992.

[46] CZYZEWSKl A., SANKlEWICZ M., Subjective Methods for the Assessing Properlies of Artificial Reverberation, 84th Audio Eng. Soc. Conv., Preprint No. 2643, Paris, 1988.

[47] CZYZEWSKl A., A Method of Artificial Reverberation Quality Testing, J. Audio Eng. Soc., Vol. 38, No. 3, pp. 129-141, 1990.

[48] CZYZEWSKl A., KACZMAREK A., Multilayer Knowledge Base System for Speaker Independent Recognition of Isolated Words, Proc. RSKD-93, pp. 411-420, 1993.

[49] CZYZEWSKl A., KACZMAREK A., Speech Recognition SystemsBasedon Rough Sets and Neural Networks, in Soft Computing (LIN T.Y., WILDBERGER AM., Eds.), Proc. 3rd Intern. Workshop on Rough Sets and Soft Computing, San Jose, CA, U.S.A., pp. 97-100,1994.

[50] CZYZEWSKl A., KACZMAREK A., Speaker-independent Recognition of Isolated Words Using Rough Sets, Joint Conf on Information Sciences, Wrightswille Beach, NC, U.S.A., pp. 397-400,1995.

[51] CZYZEWSKl A., Leaming Algorithms for Audio Signal Enhancement. Part 2: Implementation of the Rough-Set Method for the Removal of Hiss, J. Audio Eng. Soc., Vol. 45, No. 11, pp.931-943, 1997.

[52] CZYZEWSKl A., KOSTEK B., ZIELINSKl S., Synthesis of Organ Pipe Sound Basedon Simplified Physical Models, Archives of Acoustics, Vol. 21, No. 2, pp. 131-147,1996.

[53] CZYZEWSKl A., KOSTEK B., Tuning the Perceptual Noise Reduction Algorithm Using Rough Sets, Lecture Notes in Artificial Intelligence No. 1424, in Rough Sets and Current Trends in Computing, (POLKOWSKl L., SKOWRON A., Eds.), Proc. RSCTC'98, pp. 467-474, Springer-Verlag, Heidelberg and New York 1998.

234

(54]

(55]

(56]

(57]

(58]

(59]

(60]

(61]

[62]

[63]

[64]

[65]

[66]

(67]

[68]

(69]

(70]

[71]

[72]

CHAPTER9

CZYZEWSKI A., KROLIK.OWSKI R., Application of Fuzzy Logic and Rough Sets to Audio Signal Enhancement, chapter in Rough-Fuzzy Hybridization: A New Trend in Decision-Making, PAL S.K., SKOWRON A. (Eds.), Springer-Verlag, Singapore 1998 (in print). DAMASKE P., ANDO Y., Interaural Crosscorrelation for Multichannel Loudspeaker Reproduction,Acustica, Vol. 27, pp. 232-238, 1972. DE BRUUN A., Timbre-Classification of Camplex Tones, Acustica, Vol. 40, pp. 108-114, 1978. DE MlLANO (Ed.), Mind Over MIDI, Keyboard Magazine Basic Library, California, U.S.A., 1987. DE POLI G., PICClALLI A., ROADS C., Representation ofMusical Signals, MIT Press, London 1991. DüNNADIEU S., McADAMS S., WINSBERG S., Caracterisation du Timbre des Sons Complexes. L AnalyseMultidimensionelle, J. de Physique IV, Vol. 4, 3 CFA, I, pp. 593-596, 1994. EISLER H., Measurement of Perceived Acoustic Quality of Sound-Reproducing Systems by Means of Factor Analysis, J. Acoust. Soc. Amer., Vol. 39, No. 3, pp. 484-492, 1966. EVANGELISTA G., Pitch-Synchronous Wavelet Representations of Speech and Music Signals, IEEE Trans. SignalProc., Vol. 41, No. 12, pp. 3313-3330, 1993. FABRE B., HIRSCHBERG A., WIJNANDS A.P.J., Vorlex Shedding in Steady Oscillation of a Flue Organ Pipe, Acta Acustica, Vol. 82, No. 6, pp. 863-877, 1996. FASTL H., The Psychoacoustics of Sound-Quality Evaluation, Acustica, Vol. 83, No. 5, pp. 754-764, 1997. FEITEN B., STEFFEN E., V AHLE T., MERKEL A., Coding Margin, a Measure for the Headroom ofPerceptual Codecs, Intern. Conv. on Sound Design, pp. 336-355, Tonmeistertagung, Karlsruhe, 1996. FLETCHER N.H., Nonlinear Interactions in Organ Flue Pipes, J. Acoust. Soc. Amer., Vol. 56, No. 2, pp. 645-652, 1974. FLETCHER N.H., Transients in the Speech of Organ Flue Pipes - A Theoretical Study,Acustica, Vol. 34, pp. 224-233, 1976. FLETCHER N.H., ROSSING T.D., The Physics of Musical Instruments, SpringerVerlag, NewYork 1991. de FURIA S., SCACCIAFERRO J., The MIDI Book - Using MIDI and Related Inteifaces, Third Earth Production, 1986. GADE A. C., Assessment of Sound Quality in Auditoria, 90th Audio Eng. Soc. Conv., Preprint No. 3071, Paris, 1991. GENOSSAR T., PORA T M., Can One Evaluate the Gabor Expansion Using Gabor's Iterative Algorithm, IEEE Trans. Signal Processing, Vol. 40, No. 8, pp. 1852-1861,1992. GREWIN C., BERGMAN S., KENINGO.,AListening TestSystemfor Evaluation of Audio Equipment, 80th Audio Eng. Soc. Conv., Preprint No. 2335, Montreux, 1986. GRZYMALA-BUSSE J.W., LAKSHMANAN A., I.EM2 with Interval Extension: An Induction Algon"thm for Numerical Attributes, Proc. 4th International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery (RSFD-96), pp. 67-73, Tokyo, 1996.

REFERENCES 235

[73] GRZYMALA-BUSSE D.M., GRZYMALA-BUSSE J.W., Comparison of Machine Leaming and Knowledge Against Methods of Rule Induction Based on Rough Sets, Rough Sets, Fuzzy Setsand Knowledge Discovery, Springer-Verlag, London 1994.

[74] GUlLLEMAIN P., KRONLAND-MARTINET R., Parameters Estimation Through Continuous Wavelet Transform for Synthesis of Audio-Sounds, 90th Convention AES, Preprint No. 3009 (A-2), Paris, 1991.

[75] GUSKI R, Psychological Methods for Evaluating Sound Quality and Assessing Acoustic Information,Acustica, Vol. 83, No. 5, pp. 765-774,1997.

[76] HA WKES R.J., DOUGLAS H., Subjective Acoustic Experience in Concert Auditoria, Acustica, Vol. 24, pp. 236-250, 1971.

[77] HERRE J., BRANDENBURG K., EBERLEIN E., GRILL B., Second Generation ISOIMPEG-Audio Layer III Coding, 98thAES Convention, Preprint No. 3939, Paris, 1995.

[78] HONG T-P., CHEN J-B:, Automatie Acquisition ofMembership Functions by Data Analysis, Proc. 4th International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery (RSFD-96), pp. 315-319, Tokyo, 1996.

[79] HOUTGAST T., SlEENEKEN H.J.M., A Review of the MTF Concept in Room Acoustics and its Use for Estimating Speech Intelligibility in Auditoria, J. Acoust. Soc. Amer., Vol. 77, No. 3, pp. 1069-1077, 1985.

[80] HUA L., YUANDONG J., Fuzzy-Logic Tools on Tap for IC Wafers, IEEE Circuits and Devices, pp. 30-35, 1994.

[81] HULBERT G.M., BAXA D.E., SEIREG A., Criterionfor Quantitative Rating and Optimum Design ofConcert Halls, J. Acoust. Soc. Amer., Vol. 71, No. 3, pp. 619-629,1982.

[82) ISHIKAWA M., Structural Leaming and Rule Discovery, Proc. 3rd Conf. Neural Networks and Their Applications, pp. 17-29, Kule, Po land 1997.

[83) IVERSON P., KRUMHANSL C.L., Isolating the Dynamic Attributes of Musical Timbre, J. Acoust. Soc. Amer., Vol. 94, No. 5, pp. 2595-2603, 1993.

[84) JOHNSTON J., Transform Coding of Audio Signals Using Perceptual Noise Criteria, J. IEEE Select. Areas on Commun., Vol. 6, No. 2, 1988.

[85) JULLIEN J.P., Qualite acoustique d'une salle, CNET, Lannion 1986. [86) KACPRZYK J., FEDDRIZZI M. (eds), Fuzzy Regression Analysis, Vol. 1,

Omnitech Press, Warsaw and Physica-Verlag (Springer-Verlag), Heidelberg and NewYork 1992.

[87] KACZMAREK A., CZYZEWSKI A., KOS1EK B., Investigating Polynomial Approximationfor the Spectra ofthe Pipe Organ Sound, Archives of Acoustics, (in print), 1998.

[88) KARNIN E.D., A simple procedure for pruning back-propagation trained neural networks, IEEE Trans. Neural Networks, Vol. 1, pp. 239-242, 1990.

[89) KA Y S.M., Modem Spectral Estimation: Theory and Application, Englewood Cliffs, New Jersey 1988.

[90] KEEFE D.H., LADEN B., Con-elation Dimension ofWoodwind Multiphonic Tones, Technical Report Series No. 9102, Univ. ofWashington, Seattle, 1991.

[91) KEELER J.S., The Attack Transients ofSome Organ Pipes, IEEE Trans. on Audio andElectroacoustics,AU-20, Vol. 5,pp. 378-391,1972.

[92] KEIPER W., Sound Quality Evaluation in the Product Cycle, Acustica, Vol. 83, No. 5,pp. 784-788,1997.

236 CHAP1ER 9

[93) KIRBY D., WATANABE K., Fonnal Subjective Testing of the MPEG-2 NBC Multichannel Coding Algorithm, IOlst Audio Eng. Soc. Conv., Preprint No. 4418, Los Angeles, 1996.

[94) KUPPEL W., Multidimensional Relationship between Subjective Listening Impression and Objective Loudspeaker Parameters, Acustica, Vol. 70, pp. 45-54, 1990. .

[95) KOSKO B., Neural Networks and Fuzzy Systems, Prentice-Hall Intern. Ed., New Yersey, 1992.

[96) KOSKO B., Fuzzy Engineering, Prentice-Hall Intern. Ed., New Y ersey, 1997. [97) KOS1EK B., CZY2EWSKI A., Arliculation Features in the Digita/ly Contralied

Pipe Organ, J. Audio Eng. Soc., Vol. 39, No. 5, p. 382, 1991, 90th AES Convention, PreprintNo. 3023,Paris, 1991.

[98) KOS1EK B., CZY2EWSKI A., Computer Modelling of the Pipe Organ Valve Action, J. Audio Eng. Soc., Vol. 40, No. 5, p. 440, 1992, 92nd AES Convention, Preprint No. 3266, Vienna, 1992.

[99) KOS1EK B., Untersuchungen an Orgeltrakturen unter dem Aspekt musikalischer Artikulierung, Fortschritte Der Akustik, Teil A, Proc. DAGA '92, pp. 245-248, Berlin, 1992.

[100) KOS1EK B., CZY2EWSKI A., Investigation of Arliculation Features in Organ Pipe Sound, Archives of Acoustics, Vol. 18, No. 2,3, pp. 417-434, 1993.

[101] KOS1EK B., Application des reseaux de neurones pour l'analyse de l'arliculation musicale, J. de Physique IV, Vol. 4, pp. 597-600, 1994.

[102) KOS1EK B., Intelligent Control System Implementation to the Pipe Organ Instrument, in Rough Sets, Fuzzy Setsand Knowledge Discovery (ZIARKO W.P., Ed.), pp. 450-457, Springer-Verlag, London 1994.

[103) KOS1EK B., Application of Leaming Algorithms to Musical Sound Analyses, 97th AES Conv., Preprint No. 3873, San Francisco, J. Audio Eng. Soc. (Abstr), Vol. 42, No. 12, p. 1050, 1994.

[104) KOS1EK B., Rough Classification as a Tool for Acoustical Analyses, in Soft Computing (LlN T.Y., WILDHERGER AM., Eds.), Proc. 3rd Intern. Workshop on Rough Setsand Soft Computing, San Jose, CA, U.S.A., pp. 81-84, 1994.

[105) KOS1EKB., KACZMAREKA.,Musical Sound Parametrization Methods Basedon Spectral and Cepstral Transfonnations, Proc. V1 Symposium on Sound Eng. and Tonmastering, Warsaw, 1995.

[106] KOS1EK B., Statistical Versus Arlificial Intelligence Based Processing of Subjective Test Results, 98th Audio Eng. Soc. Conv., Preprint No. 4018, Paris, J. Audio Eng. Soc. (Abstracts), Vol. 43, No. 5, p. 403, 1995.

[1 07] KOS1EK B., Methoden kuenstlicher Intelligenz in Analysen des Musikklangs, Proc. DAGA'95, Saarbruecken, 1995.

[108) KOS1EK B., Automatie Reasoning About Acoustic Data-Problems with Preprocessing, Classification and Decision Uncerlainty, Proc. Conf. of the Intelligent Data Analysis (IDA-95), The International Institute for Advanced Studies in Systems Research and Cybernetics, Vol. 1, pp. 99-103, Baden-Baden, 1995.

[1 09) KOS1EK B., Distinctive Features ofMusical Signal, Proc. VI Symposium on Sound Eng. and Tonmastering, Warsaw, 1995.

[110] KOS1EK B., Computer Based Recognition ofMusical Phrases Using the Rough Set Approach, Proc. Second Annual Joint Conference on Information Sciences, North Carolina, U.S.A., 1995.

REFERENCES 237

[111] KOS1EK B., Feature Extraction Methods for the Intelligent Processing of Musical Signals, Proc. 99th Convention AES, Preprint No. 4076 (H4), New York, J. Audio Eng. Soc. (Abstracts), Vol. 43, No. 12, 1995.

[112] KOS1EK B., SZCZERBA M., Parametrie Representation ofMusical Phrases, 101st Audio Eng. Soc. Conv., Preprint No. 4337 (D-3), Los Angeles, 1996.

[113] KOS1EK B., SZCZERBA M., MIDI Database for the Automatie Recognition of Musical Phrases, 100th Convention AES, Preprint No. 4169 (E-2), Copenhagen,

. 1996. [114] KOS1EK B., CZYZEWSKl A., Automatie Classification of Musical Timbre Based

on Leaming Algorithms Applicable to Cochlear Implants, Proc. 1AS1ED, Expert Systems, and Neural Networks, pp. 98-101, Honolulu, Hawaii, U.S.A., 1996.

[115) KOS1EK B., SZCZERBA M., WIECZORK.OWSKA A., Musical Databases -Construction and Analysis, Intern. Conv. on Sound Design, Proc. 19th Tonmeistertagung, Karlsruhe, 1996.

[116] KOS1EK B., WIECZORK.OWSKA A., Study of Parameter Relations in Musical Instntment Patterns, 100th ConventionAES, Preprint No. 4173 (E-6), Copenhagen, J. Audio Eng. Soc. (Abstracts), Vol. 44, No. 7/8, p. 634, 1996.

[117] KOS1EKB., CZYZEWSKl A., Method and Apparatlis for the Electronic Control of the Pipe Organ, Polish Patent, No. 1699913, 1996.

[118) KOS1EK B., SZCZERBA M., Rough Set-Based Analysis of Musical Databases, Proc. EUFIT'96, Voll, pp. 144-148, Aachen, 1996.

[119] KOS1EK B., Intelligent Analysis of Musical Databases, Proc. 4th International Wotkshop on Rough Sets, Fuzzy Sets, and Machine Discovery (RSFD-96), pp. 300-305, Tokyo, 1996.

[120) KOS1EK B., Rough Set and Fuzzy Set Methods Applied to Acoustical Analyses, J. Intell. Automation and Soft Computing- Autosoft, Vol. 2, No. 2, pp. 147-158, 1996.

[121] KOS1EK B., Articulation-Related Features in the Pipe Organ Sound, Archives of Acoustics, Vol. 22, No. 2, pp. 219-244, 1997.

[122) KOS1EK B., KROLIK.OWSKl R., Application of Neural Networks to the Recognition ofMusical Sounds, Archives of Acoustics, Vol. 22, No. 1, pp. 27-50, 1997.

[123] KOS1EK B., KROLIK.OWSKl R., Artificial Neural Network as a Classifier of Musical Instrument Sounds, Proc. EUFIT'97, Aachen, pp. 485-489, 1997.

[124] KOS1EK B., Soft Set Approach to the Subjective Assessment ofSound Quality, Proc. InterSymp'97, Baden-Baden, Advances in Artificial Intelligence and Engineering Cybemetics, Vol. IV, pp. 107-111, 1997.

[125] KOS1EK B., Sound Quality Assessment Basedon the Rough Set Classifier, Proc. EUFIT'97,pp.193-195,Aachen, 1997.

[126] KOS1EK B., SZCZERBA M., CZYZEWSKl A., Rough Set Based Analysis of Computer Musical Storage, Proc. ICCIMA'97, pp. 140-144, Brisbane, Australia, 1997.

[127] KOS1EK B., SZCZERBA M., Application of Algorithms Dealing with Time Domain Uncertainty for the Automatie Recognition of Musical Phrases, 102nd Convention AES, Preprint No. 4502, Munich, 1997.

[128] KOS1EK B., WIECZORK.OWSKA A., A System for Musical Sound Parameter Database Creation and Analysis, 102nd Convention AES, Preprint No. 4498 (N3), Munich, 1997.

238 CHAPTER9

[129] KOSTEK B., WIECZORKOWSKA A., Parametrie Representation of Musical Sounds, Archives ofAcousti.cs, Vo1. 22, No. 1, pp. 2-26,1997.

[130] KOSTEKB., Computer-Based Recognition ofMusical Phrases Using the Rough-Set Approach, J. InfonnationSciences, Vol. 104,pp. 15-30,1998.

[131] KOSTEK B., Soft Set Approach to the Subjective Assessment of Sound Quality, FUZZ-IEEE'98 (World Congress on Computational Intelligence), pp. 669-674, Anchorage, Alaska, U.SA., May 1998.

[132] KOSTEK B., Automatie Recognition of Sounds of Musical Instruments: An Experl MediaApplication, WorldAutomation Congress, WAC'98, IFMIP-053, Anchorage, Alaska, U.SA., May 1998.

[133] KOSTEK B., Soft Computing-Based Recognition of Musical Sounds, chapter in Rough Sets in Data Mining and Knowledge Discovery IIII, POLKOWSKI L., SKOWRON A. (Eds.), Physica-Verlag (Springer-Verlag), Chapter 11, pp. 193-213, 1998.

[134] KOSTEK B., Assessment ofConcerl Hall Acoustics Using Rough Set and Fuzzy Set Approach, chapter in Rough-Fuzzy Hybridization: A New Trend in Decision-Making, PAL S.K., SKOWRON A. (Eds.), Springer-Verlag, Singapore 1998 (in print).

[135] KRIMPHOFF J., McADAMS S., WINSBERG S., Caracterisation du Timbre des Sons Complexes. II. Analyses acoustiques et quantification psychophysique, J. de Physique IV, Vol. 4, pp. 625-628, 1994.

[136] KRÖLIKOWSKI R., B. KOSTEK B., Recognition ofMusical InstrumentsBasedon Neural Networks, Proc. 3rd Conf. on Neural Networks and Their Applicati.ons, pp. 195-200, K.ule, Poland, 1997.

[137] KRUSKAL J.B., Multidimesional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis, Psychometrika, Vol. 29, No. 1, pp. 1-27, 1964.

[138] KRAMER P.S., Mean Free Path Length for Radiating Point Sources in Specular Reflecting Enclosures, Acta Acusti.ca, Vol. 83, No. 4, pp. 629-634, 1997.

[139] KUTIRUFF H., Energetic Sound Propagation in Rooms, Acta Acusti.ca, Vol. 83, No. 4, pp. 622-628, 1997.

[140] LAMORAL R., Point actuel de l'acoustique des salles, Revue d'Acoustique, No. 26, pp.l90-191, 1973.

[141] LEIIMAN P., WILKENS H., Zusammenhang Subjectiver Beurleilungen von Konzertsolen mit Raumakustichen Kriterien, Acusti.ca, Vol. 15, pp. 226-268, 1980.

[142] LEACH J., FITCH J.: Nature, Music, and Algorithmic Composition, Computer Music Journal, Vol. 19, No. 2, 1995.

[143] LENARCIK A., PIASTA Z., Deterministic Rough Classifiers, in Soft Computing (LIN T.Y., WILDBERGER AM., Eds.), Proc. 3rd Intern. Workshop on Rough Set$ and Soft Computing, San Jose, CA, U.SA., pp. 434441, 1994.

[144] LEONARTIES I.J., BILLINGS SA., Input-Output Parametrie Models for NonLinear Systems Parl II: Stochastic Non-Linear Systems, Int. J. Control, Vol. 41, No. 2, pp. 329-344, 1985.

[145] LOTTERMOSER W., Acoustical Design of Modem German Organs, J. Acoust. Soc. Amer., Vol. 29, No. 6, pp. 682-689, 1957.

[146] L:ijTOWSKI T., Auditory Evaluation of Acoustic Devices, Music Academy of Warsaw, Warsaw, 1984.

[147] MAGOULAS G., VRAHATIS M., ANDROULAKIS G., Effective Backpropagation Training with Variable Stepsize, Neural Networks, Vol. 10, No. 1, pp. 69-82, January 1997.

REFERENCES 239

[148) MALLAI S., Zero-Crossings of a Wavelet Transform, IEEE Trans. on fufonnation Theory, Vol. 37,No4,pp. 1019-1033,July 1991.

[149] MALLOCH S.N., CAMPBELL A.M., An investigation of musical timbre, J. de Physique IV, Vol. 4, pp. 589-592,1994.

[150] MARPLE Jr. S.L., Digital Spectral Analysis: with Applications, Englewood Cliffs, New Jersey, 1987.

[151] McAULA Y R.J., QUA TIERI T.F ., Speech Analysis/Synthesis Based on a Sinusoidal Representation, IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-34, pp. 744-754,1986.

[152] MclNTYRE M.E., SCHUMACHER R.T. ,WOODHOUSE J., On the Oscillation of Musical Instruments, J. Acoust. Soc. Amer., Vol. 74, No. 5, pp. 1325-1345, 1983.

[153) MEYER J., The Sound ofthe Orchestra, J. Audio Eng. Soc., Vol. 41, No. 4, pp. 203-212, 1993.

[154) MEYER Y., Wavelets andApplications, Springer-Verlag, Paris 1992. [155) MONRO G., Fractal Interpolation Waveforms, Comp. Music Journal, Vol. 19, No.

I, pp. 88-98, 1995. [156) MORANDO M., MUSELLI M., GUARIANO M., Musical Rhythm Recognition with

Neural Networks, Proc. lAS TED, Artificial futelligence, Expert Systems, and Neural Networks, pp. 229-232, Honolulu, Hawaii, U.S.A., 1996.

[157) MOURJOPOULOS J., TSOUKALAS D., Neural Network Mapping to Subjective Spectra of Music Sounds, 90th J. Audio Eng. Soc. Conv., Preprint No. 3064, Paris 199l,J.AudioEng. Soc. (Abstr), Vol. 39, 5, (1991).

[158) NOLLE A.W., FINCH T.L., Starting Transients of Flue Organ Pipes in Relation to Pressure Rise Time, J. Acoust. Soc. Amer., Vol. 91, No. 4, pp. 2190-2202, 1992.

[159] OOMEN W., BüNT F., KERKHOF L., Variable Bit Rate Coding for MPEG - 1 Audio, Layers I and II, 98thAES Convention, Preprint No. 3938, Paris, 1995.

[160) ORR R.S., The Order of Computation of Finite Discrete Gabor Transforms, IEEE Trans. SignalProcessing, Vol. 4l,No l,pp. 122-130,1993.

[161] PAWLAK Z., Rough Sets, J. Computer and fufonnation Science, Vol. 11, No. 5, pp. 341-356, 1982.

[162] PAWLAK Z., Data versus Logic - A Rough Set View, Proc. 4th futernational Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery (RSFD-96), pp. 1-8, Tokyo, 1996.

[163] PAWLAK Z., Reasoning about Data- A Rough Set Perspective, Lecture Notes in Artificial futelligence No. 1424, in Rough Setsand Current Trends in Computing, {POLKOWSKI L., SKOWRON A., Eds.), Proc. RSCTC'98, pp. 25-34, SpringerVerlag, Heidelberg andNewYork 1998.

[164] PIEZO FILM SENSORS TECHNICAL MANUAL, AMP fuc. Piezo Film Sensors, BasicDesign Kit, U.S.A., 1993.

[165] POLLARD H.F., Time Delay Effects in the Operation of a Pipe Organ, Acustica, Vol. 20, No. 4, pp. 189-199, 1968.

[166] POLLARD H.F., JANSSON E.V., A Tristimulus Method for the Specification of Musical Timbre,Acustica, Vol. 51, 162-171,1982.

[167] POLKOWSKI L., SKOWRON A. (Eds.), Rough Sets and Current Trends in Computing, Lecture N otes in Artificial futelligence, Springer-Verlag, Heidelberg and NewYork 1998.

[168] POLKOWSKI L., SKOWRON A. (Eds.), Rough Sets in Knowledge Discovery 1: Methodology and Applications, Physica-V erlag, Heidelberg and New York 1998.

240 CHAP1ER9

[169] POLKOWSKI L., SKOWRON A. (Eds.), Rough Sets in Knowledge Discovery 2: Applications, Case Studiesand Software Systems, Physica-Verlag (Springer-Verlag), Heidelberg and New York 1998.

[170] POWELL A., On the Edgetone, J. Acoust Soc. Amer., Vol. 33, No. 4, pp. 395-409, 1961.

[171] PRATI R.L., DOAK P.E., A Subjective Rating Scale for Timbre, J. Sowtd and Vibration, Vol. 45, No. 3, pp. 317-328, 1976.

[172] PRESS W.H., FLANNERY B.P., 1EUK.OLSKY S.A., VETIERLING, Numerical Recipes, Cambridge University Press, 1986.

[173] PULACZEWSKI J., SZACKA K., MANITIUS A., Theory of Automation, PWN, Warsaw, 1974 (in Polish).

[174] QUIAN S., CHEN D., Discrete Gabor Transform, IEEE Trans. on Signal Processing, Vol. 41, No. 7, pp. 2429-2438, 1993.

[175] RABINER L.R, ROSENBERG A.E, LEVINSON S.E., Considerations in Dynamic Time Warping Algorithms for Discrete Word Recognition, IEEE Trans. on Acoustics, Speech, and Signal processing, Vol. ASSP-26, No. 6, 1978.

[176] RABINER L.R, SCHAFER R.W., Digital Processing ofSpeech Signals, Eng1ewood Cliffs, Prentice Hall, 1978.

[177] RAKOWSKI A., RICHARDSON E.G., Eine Analyse des Intonierungsvorganges bei Orgeln, Gravesaner Blätter, Vol. 15, No. 16, pp. 46-58, 1960.

[178] REICHARDT W., ALIM A.O., SCHMIDT W., Definition und Messgrundlage eines objektiven Masses zur Ermittlung der Grenze zwischen brauchbarer und unbrauchbarer Durchsichtigkeit bei Musikdarbietung, Acustica, Vol. 32, pp. 126-137, 1975.

[179] RHEE H-S., OH K-W., Unsupervised Neural Network for Fuzzy Clustering, Proc. EUFIT'96, Vol. 2, pp. 715-719,Aachen, 1996.

[180] ROEDERER J.G., Introduction to the Physics ond Psychophysics of Music, Springer-Verlag, Vol.16,NewYorkandHeidelberg 1979.

[181] SANDELL G.J., SHARC- Sondell Harmonie Archive, Database ofMusical Timbre Information {wtpub1ished material), 1994.

[182] SANDELL G.J., MAR1ENS W.M., Perceptual Evaluation ofPrincipal-ComponentBased Synthesis ofMusical Timbres, J. Audio Eng. Soc., Vol. 43, No. 12, pp. 1013-1028, 1995.

[183] SCHROEDER M.R., GOTTLOB D., SIEBRASSE K.F., Comparative Study of European Concert Halls, Correlation of Subjective Preference with Geometrie and Acoustic Parameters, J. Acoust Soc. Amer., Vol. 56, No. 4, pp. 1195-1201, 1974.

[184] SCHROEDER M.R., NaturalSounding of Artlficial Reverberation, J. Audio Eng. Soc., Vol. 10, No. 3, pp. 219-223, 1962.

[185] SCHROEDER M.R., Self-Similarity and Fractals in Science and Art, J. Audio Eng. Soc., Vol. 37, No. 10,1989.

[186] SCHUMACHER R. T., Self-Sustained Oscillations ofOrgan Flue Pipes: An Integral Equation Solution, Acustica, Vol. 39, pp. 225-238, 1978.

[187] SHLIEN S., SOULODRE G., Measuring the Chamcteristics of Expert Listeners, 10lstAudio Eng. Soc. Conv., Preprint No. 4339, LosAngeles, U.S.A., 1996.

[188] SKOWRON A., Data Filtration: a Raugh Set Approach, in Rough Sets, Fuzzy Sets and Knowledge Discovery, (ZIARKO W.P., Ed.), Springer-Verlag, pp. 108-118, London 1994.

REFERENCES 241

[189] SKOWRON A., Decision Rufes Based on Discemibility Matrices and Decision Matrices, in Soft Computing (LIN T.Y., WILDHERGER AM., Eds.), Proc. 3rd Intern. Workshop on Rough Setsand Soft Computing, San Jose, CA, U.SA., pp. 6-9,1994.

[190] SKOWRON A., NGUYEN S.H, Quantization of Real Value Attributes: Rough Set and Boolean Reasoning Approach, ICS Research Report 11195, Warsaw, 1995.

[191) SLOWINSKI R., S1EFANOWSKI J., SUSUMAGA R., Rough Set Analysis of Attribute Dependencies in Technical Databases, Proc. 4th International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery (RSFD-96), pp. 284-291, Tokyo, 1996.

[192) SLOWINSKI R., Rough Set Processing of Fuzzy Information, in Soft Computing (LIN T.Y., WILDBERGERA.M., Eds.), Proc. 3rd Intern. Workshop on Rough Sets and SoftComputing, SanJose, CA, U.SA., pp. 142-145,1994.

[193) SMITH III J.O., Physical Modeling Using Digital Waveguides, Computer Music Journal, special issue on Physical Modeling ofMusical Instruments, Part I, Vol. 16, No. 4, pp. 79-91, 1992.

[194] SMITH M., SMAILL A. WIGGINS GA. (Eds.), Music Education: An Artificial Intelligence Approach, Worlcshops in Computing, Springer-Verlag, Edinburgh 1993.

[195) SPORER TH, Objective Audio Signal Evaluation-Applied Psychoacoustics for Modefing the Perceived Quality of Digital Audio, 103rd Audio Eng. Soc. Conv., PreprintNo. 4512,NewYork, 1997.

[196) S1EENEKEN HJ.M., HOUTGAST T., A Physical Method for Measuring Speech Transmission Quality, J. Acoust. Soc. Amer., Vol. 67, No. 1, pp. 318-326, 1980.

[197] STOLL G., A Perceptual Coding Technique Offering the Best Compromise between Quality, Bit-Rate ond Complexity for DSB, 94th AES Convention, Preprint No. 3458, Berlin, 1993.

[198) SUGENO M., An Introductory Survey of Fuzzy Control, Infonnation Sciences, Vol. 36,pp.59-83, 1985.

[199] TADEUSIEWICZ R., Speech Signal, WKiL, Warsaw 1988, (in Polish ). [200) TADEUSIEWICZ R., Neural Nets, Academic Printing Office RM, Warsaw 1993,

(in Polish). [201] TANGUIANE A.S., Artificial Perception and Music Recognition, Lecture Notes in

Artificial Intelligence, Springer-Verlag, Berlin 1993. [202] THE NEW GROVE DICTIONARY OF MUSIC AND MUSICIANS edited by S.

Sadie, Macmillan Publishers, London, Washington, Hong Kong, 1980. [203) TRIELE R., Richtungsverteilung und Zeitfolge der Schallrueckwuife in Raumen,

Acustica, Vol. 3, pp. 291-302, 1953. [204] TSUMOTO S., YAO Y.Y., HADJIMICHAEL M. (Eds.), Bulletin of International

Rough Set Society, Vol. 2, No. 1, June 1998. [205) UEMATSU H, OZAWA K., SUZUKI Y., SONE T.,A Considemtion on the Timbre

of Camplex Tones Only Consisting of Higher Hannonics, Proc. 15th Intern. Congress onAcoustics, Trondheim, Norway, pp. 509-512, 1995.

[206] VÄLIMÄKI V., KARJALAINEN M., JANOSY Z., LAINE U.K.,A Real-Time DSP Impfementalion of a Flute Model, Proc. Int. Conf. Acoustics, Speech and Signal Processing (ICASSP'92 ), San Francisco, 1992.

[207] VÄLIMÄKI V., HUOPANIEMI J., KARJALAINEN M., JANOSY Z., Physical Modefing of Plucked String Inst1uments with Application to Real-Time Sound

242 CHAPTER9

Synthesis, 98th Audio Eng. Soc. Conv., Preprint No. 3956, Paris, 1995, J. Audio Eng. Soc. (Abstr), Vol. 41,No. 5,1995.

[208] Wavelets - Theory, Algorithms and Applications, CHUI Ch.K., MONTEFUSCO L., PUCCO L. (Eds. ), Academic Press, Inc., San Diego 1994.

[209] VERGE M.P., CAUSSE R., FABRE B., HIRSCHBERG A., WUNANDS A.P.J., VAN STEENBERGEN A., Jet Oscillations and Jet Drive in Recorder-Like Instruments, ActaAcustica, Vol. 2, No. 5, pp. 403-419, 1994.

[210] VERGE M.P., FABRE B., MAHU W.E.A., HIRSCHBERG A., Jet Formation and Jet Velocity Fluctuations in a Flue Organ Pipe, J. Acoust. Soc. Amer., Vol. 95, No. 2,pp. 1119-1132,1994.

[211] WESTIIEAD M.D., SMAllL A., Automatie Characterisation of Musical Style, in Music Education: An Artificial Intelligence Approach, SMlTII M., SMAILL A. WIGGINS G.A. (Eds.), Workshops in Computing, pp. 157-170, Springer-Verlag, Edinburgh 1993.

[212] WillMANN U., Three Application Examples for Sound Quality Design Using Psychoacoustic Tools, Acustica, Vol. 83, No. 5, pp. 819-826, 1997.

[213] WIDMER G., Modeling the Rational Basis of Musical Expression, Computer Music Journal, Vol. 19, No. 2, pp. 76-96, 1995.

[214] WILSON R., CALWAY A. D., PEARSON E. R. S., A Generalized Wavelet Transform for Fourier Analysis, the Multiresolution Fourier Transform and its Application to Image and Audio Signal Analysis, IEEE Trans. on Information Theory, Vol. 38, No 2, pp. 674-690, March 1992.

[215] YAGER R.R., KACPRZYK J., FEDRIZZI M. (Eds.), Advances in the DempsterShafer Theory and Evidence, Wiley, New York, 1994.

[216] YAMAGUCHI K., Multivariate Analysis of Subjective and Physical Measures of Hall Acoustics, J. Acoust. Soc. Amer., Vol. 52, No. 5, pp. 1271-1279, 1972.

[217] YAMAKAWA T., Stabilization of an Inverted Pendulum by a High-Speed Fuzzy Logic Controller Hardware System, Fuzzy Setsand Systems, Vol. 32, pp. 161-180, 1989.

[218] ZADEH L., Fuzzy Sets, Information and Control, Vol. 8, pp. 338-353, 1965. [219] ZADEH L., KACPRZYK J. (Eds. ), Fuzzy Logic for the Management of Uncertainty,

Wiley,NewYork, 1992. [220] ZADEH L., KACPRZYK J. (Eds.), Computing with Words in Information/

Intelligent Systems, Physica-Verlag (Springer-Verlag), Heidelberg and New York 1999.

[221] ZEMANKOWA M., KACPRZYK J., ( Guest eds. ), Integrating Artificial Intelligence and Databases Technologies, Special Issue of J. oflntelligent Information Systems, Vol. 2, No. 4, 1993.

[222] ZIARKO W. (Ed), Rough Sets, Fuzzy Sets, and Knowledge Discovery, SpringerVerlag, London 1994.

[223] ZIARKO W., Analysis of Uncertain Information in the Framework of Variable Precision Rough Sets, Foundations of Computing and Decision Sciences, Vol. 18, No. 3-4, pp. 381-396, Poznan, 1993.

[224] ZIARKO W., Review of Basics ofRough Sets in the Context of Data Mining, Proc. 4th International Workshop on Rough Sets, Fuzzy Sets, and Machine Discovery (RSFD-96), pp. 447-457, Tokyo, 1996.

REFERENCES 243

[225] ZWICKER E., ZWICKER T., Audio Engineering anti Psychoacoustics: Matehing Signals to the Final Receiver, the Human Auditory System, J. Aud.io Eng. Soc., Vol. 39, No. 3, pp. 115-126, March 1991.

[226) ZURADA J., Introduction to Artificial Neural Systems, West Publishing Comp., St. Pau11992.

[227] ZURADA J., MALINOWSKI A., Multilayer Perceptron Networks: Selected Aspects ofTraining Optimization, Applied Mathematics and Comp. Science, Vol. 4, No. 3, pp. 281-307, 1994.

Studies in Fuzziness and Soft Computing

Vol. 25. J. Bucldey and Th. Feuring Fuzzy anti Neural: lnteractions anti Applications, 1999 ISBN 3-7908-1170-X

Vol. 26. A. Yazici and R. George Fuzzy Database Modeling, 1999 ISBN 3-7908-1171-8

Vol. 27. M. Zaus Crisp anti Soft Computing with Hypercubical Calculus, 1999 ISBN 3-7908-1172-6

Vol. 28. R.A. Ribeiro, H.-J. Zimmennann, R. R. Yager and J. Kacprzyk (Eds.) Soft Computing in Financiol Engineering, 1999 ISBN 3-7908-1173-4

Vol. 29. H. Tanaka and P. Guo Possibilistic Data Analysis for Operations Research, 1999 ISBN 3-7908-1183-1

Vol. 30. N. Kasabov and R. Kozma (Eds.) Neuro-Fuzzy Techniques for Intelligent Informations Systems, 1999 ISBN 3-7908-1187-4

[studies in fuzziness and soft computing] soft computing in acoustics volume 31 ||

Documents