three-dimensional microphone arrays for spatially selective sound capture

2
TECHNICAL NOTES AND RESEARCH BRIEFS Paul B. Ostergaard 10 GlenwoodWay, West Caldwell, NJ 07006 Editor's Note: Original contributions to the Technical Notes and Research Briefs section are always welcome. Manuscripts shouldbe double-spaced, and ordinarily not longer than about 1500 words. There are no publication charges, and consequently, no free reprints; however, reprintsmay be purchasedat the usual prices. Advanced-degree dissertations in acoustics Editor• note: Abstracts of Doctoral and Master's theses will be wel- comed at all times.Please note that they mustbe double spaced, limited to 200 words, must include the appropriate PACSclassification numbers, and formatted asshown below (don't make theeditor retype them, please!). The address for obtaining a copy of the thesis is helpful. Please submittwo copies. The boundary element method for sound field calculations [43.20.Fn, 43.20.Rz, 43.20.Tb]--Peter M. Juhl, The Acoustics Laboratory, Technical University of Denmark, 2800 Lyngby, Denmark, 20th January 1994 (Ph.D.).The boundary element method (BEM) is a numerical method for solving an integral equation that represents the properties of a domain--in the present casea sound field--by means of the field on the boundary of the domain. The thesis covers a diversity of different aspects of the topic,ranging from an introduction of the basics of the directBEM to practical demonstrations of its applicability. Contributions to the state of the art include an axisymmetric integral equation for nonaxisymmetric bound- ary conditions, the use of a generalized quarter-point technique to model singularities in the sound field, and the useof rank revealing factorizations as a means of ensuring uniqueness. Thesis advisor: Finn Jacobsen. Copies of this thesis may be obtained from PeterM. Juhl, The Acoustics Laboratory, Building352, Technical Universityof Denmark,DK-2800 Lyn- gby, Denmark. Perpetual speech coding [43.72.Gy]--Randy G. Goldberg, CAIP Center, Rutgers University, Piscataway, NJ 08855-1390, January 1994 (Ph.D. Electrical& Computer Engineering). Digital transmission of coded speech is becoming increasingly important in a wide variety of real-time applications suchas multi-media conferencing systems, cockpitto tower speech transmissions for pilot/controller communication, and wireless tele- phonetransmissions. By reducing the amount of data needed to code the speech, onecanoptimally utilize the limitedresources of transmission band- width. The importance of efficient digital storage of coded speech is also becoming increasingly important for such applications as voice messaging, answering machines, digital speech recorders, and storage of large speech databases for low bit-rate speech coders. Economies in storage memory can be obtained through high-quality, low-bit-rate coding. By takingadvantage of the properties of the human speech production andauditory systems, we can greatlyreduce the required capacity for coding speech, with minimal perceptual degradation of the speech signal. The monaural masking phe- nomena of the human auditory system canbe exploited to gain great econo- miesin speech coding. To takefull advantage of thisphenomena, onemust estimate the parameters of speech to obtain high resolution in boththe time and frequency domain. Standard transform methods (such as the discrete Fourier transform) do not necessarily yield the required resolution because they are constrained by time-frequency trades. To improve the resolution of signal analysis, the fact that speech is produced by the human vocal system canbe utilized. Proper parametrization of a signal constrained by a model of the vocal systemleads to efficient encoding methodsand facilitatesthe analysis process so that the required time-frequency resolution can be ob- tained.This dissertation outlines a methodfor the codingof high-quality speech to eliminate coding of unnecessary signals as dictated by the mon- auralmasking properties of the human auditory system. A 20-kbit/sspeech coderhasbeenimplemented usingthe vocal analysis and auditory process- ing discussed above. Listeningtestshave been performed on the speech coder to verify that the coding process is perceptually transparent. Thesisadvisor: J. L. Flanagan. Digital hardware and control for a beamforming microphone array [43.38.Kb]--Daniel V. Rabinkin, CAIP Center, RutgersUniversity, Piscataway, NJ 08855-1390, December1993 (M.S. Electrical & Computer Engineering). Microphone arrays can be used for high-quality sound pick up in reverberant andnoisy elements. Conventional single microphone methods suffersevere degradation in quality underthese conditions. The beamform- ing capabilities of the microphone array systemallow highly directional sound capture, providing enhanced signal-to-noise ratio (SNR) whencom- pared to single microphone performance. Single beamforming arrays oper- ate by summing the delayed outputs of the component microphones. The arrayhasa focus location thatis determined by the geometry of microphone spacing and the individual delay values.The technique of beamforming allows the focuslocationto be shiftedby insertion of variable delay lines between the microphones and the summing element. Directional steering of the array is achieved by control of the delay lines and requires no physical movement of the system. Previous microphone arrayimplementations have beencarried out usinganalogdelay lines due to limitations in digital pro- cessing speed. Technical advances in processor speed andmemory available now allow construction of a microphone arraysystem that uses digitaltech- nology. This provides precise controland easy modification of the beam- formingalgorithm. Additional techniques such as useof adaptive antirever- beration filters, whichwere not feasible with the analog approach, can now be implemented. A practicalreal-time digital microphone array systemis described in thisthesis. The system consists of a 16-channel digitizing front end,a 6-processor AT&T SURboard for signal processing, anda Sun4 work- stationfor array control and data recording. The system can be easily be expanded to handle a greaterchannel count.The implemented system is portableand provideshands-free untethered use. It can track a moving speaker and adaptto changing environments. The array is an ideal sound capture device for circumstances whereit is costly or inconvenient to pro- vide closetalking microphones for all potential sound sources of interest. Possible applications for microphone arrays include conference centers, con- certs, sporting events, and cellular radioin automobiles. Thesisadvisor: J. L. Flanagan. Three-dimensional microphone arrays for spatially selective sound capture [43.38.Kb]•Arun C. Surendran, CAIP Center, Rutgers University, Piscataway, NJ 08855-1390, April 1993 (M.S. Electrical & Computer Engineering). High-quality sound capture in reverberant enclo- sures is important in applications like teleconferencing. This, combined with the needto eliminate the useof hand-held or body-worn equipment, hasled to the use of one- and two-dimensional autodirective microphone arrays. Recentadvances in transducer technology, signalprocessing, and comput- ing, has made possible the use of a large numberof sensors in three- dimensional arrays. By simpledelay-and-sum, these arrays providebeams whose widths are independent of the steering direction in three dimensions. These arrays havealso been shown to provide useful volume selectivity, i.e., capture of sound from specified spatialvolumes. In our simulations, the reverberant environment hasbeencharacterized usingan imagemodel, i.e., assuming multiple sources formed dueto imaging on thereflecting surfaces. By forming multiple beams pointed at thesource andat these images andby combining the outputs of these beams, significant improvement in speech signal-to-noise ratio (SNR) hasbeenachieved. The spatial selectivity and 2595 J. Acoust.Soc. Am. 96 (4), October 1994 0001-4966/94/96(4)/2595/2/$6.00 ¸ 1994 Acoustical Societyof America 2595 Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Thu, 18 Dec 2014 11:26:48

Upload: arun-c

Post on 12-Apr-2017

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Three-dimensional microphone arrays for spatially selective sound capture

TECHNICAL NOTES AND RESEARCH BRIEFS

Paul B. Ostergaard 10 Glenwood Way, West Caldwell, NJ 07006

Editor's Note: Original contributions to the Technical Notes and Research Briefs section are always welcome. Manuscripts should be double-spaced, and ordinarily not longer than about 1500 words. There are no publication charges, and consequently, no free reprints; however, reprints may be purchased at the usual prices.

Advanced-degree dissertations in acoustics Editor• note: Abstracts of Doctoral and Master's theses will be wel-

comed at all times. Please note that they must be double spaced, limited to 200 words, must include the appropriate PACS classification numbers, and formatted as shown below (don't make the editor retype them, please!). The address for obtaining a copy of the thesis is helpful. Please submit two copies.

The boundary element method for sound field calculations [43.20.Fn, 43.20.Rz, 43.20.Tb]--Peter M. Juhl, The Acoustics Laboratory, Technical University of Denmark, 2800 Lyngby, Denmark, 20th January 1994 (Ph.D.). The boundary element method (BEM) is a numerical method for solving an integral equation that represents the properties of a domain--in the present case a sound field--by means of the field on the boundary of the domain. The thesis covers a diversity of different aspects of the topic, ranging from an introduction of the basics of the direct BEM to practical demonstrations of its applicability. Contributions to the state of the art include an axisymmetric integral equation for nonaxisymmetric bound- ary conditions, the use of a generalized quarter-point technique to model singularities in the sound field, and the use of rank revealing factorizations as a means of ensuring uniqueness.

Thesis advisor: Finn Jacobsen.

Copies of this thesis may be obtained from Peter M. Juhl, The Acoustics Laboratory, Building 352, Technical University of Denmark, DK-2800 Lyn- gby, Denmark.

Perpetual speech coding [43.72.Gy]--Randy G. Goldberg, CAIP Center, Rutgers University, Piscataway, NJ 08855-1390, January 1994 (Ph.D. Electrical & Computer Engineering). Digital transmission of coded speech is becoming increasingly important in a wide variety of real-time applications such as multi-media conferencing systems, cockpit to tower speech transmissions for pilot/controller communication, and wireless tele- phone transmissions. By reducing the amount of data needed to code the speech, one can optimally utilize the limited resources of transmission band- width. The importance of efficient digital storage of coded speech is also becoming increasingly important for such applications as voice messaging, answering machines, digital speech recorders, and storage of large speech databases for low bit-rate speech coders. Economies in storage memory can be obtained through high-quality, low-bit-rate coding. By taking advantage of the properties of the human speech production and auditory systems, we can greatly reduce the required capacity for coding speech, with minimal perceptual degradation of the speech signal. The monaural masking phe- nomena of the human auditory system can be exploited to gain great econo- mies in speech coding. To take full advantage of this phenomena, one must estimate the parameters of speech to obtain high resolution in both the time and frequency domain. Standard transform methods (such as the discrete Fourier transform) do not necessarily yield the required resolution because they are constrained by time-frequency trades. To improve the resolution of signal analysis, the fact that speech is produced by the human vocal system can be utilized. Proper parametrization of a signal constrained by a model of the vocal system leads to efficient encoding methods and facilitates the analysis process so that the required time-frequency resolution can be ob- tained. This dissertation outlines a method for the coding of high-quality speech to eliminate coding of unnecessary signals as dictated by the mon- aural masking properties of the human auditory system. A 20-kbit/s speech coder has been implemented using the vocal analysis and auditory process-

ing discussed above. Listening tests have been performed on the speech coder to verify that the coding process is perceptually transparent.

Thesis advisor: J. L. Flanagan.

Digital hardware and control for a beamforming microphone array [43.38.Kb]--Daniel V. Rabinkin, CAIP Center, Rutgers University, Piscataway, NJ 08855-1390, December 1993 (M.S. Electrical & Computer Engineering). Microphone arrays can be used for high-quality sound pick up in reverberant and noisy elements. Conventional single microphone methods suffer severe degradation in quality under these conditions. The beamform- ing capabilities of the microphone array system allow highly directional sound capture, providing enhanced signal-to-noise ratio (SNR) when com- pared to single microphone performance. Single beamforming arrays oper- ate by summing the delayed outputs of the component microphones. The array has a focus location that is determined by the geometry of microphone spacing and the individual delay values. The technique of beamforming allows the focus location to be shifted by insertion of variable delay lines between the microphones and the summing element. Directional steering of the array is achieved by control of the delay lines and requires no physical movement of the system. Previous microphone array implementations have been carried out using analog delay lines due to limitations in digital pro- cessing speed. Technical advances in processor speed and memory available now allow construction of a microphone array system that uses digital tech- nology. This provides precise control and easy modification of the beam- forming algorithm. Additional techniques such as use of adaptive antirever- beration filters, which were not feasible with the analog approach, can now be implemented. A practical real-time digital microphone array system is described in this thesis. The system consists of a 16-channel digitizing front end, a 6-processor AT&T SURboard for signal processing, and a Sun4 work- station for array control and data recording. The system can be easily be expanded to handle a greater channel count. The implemented system is portable and provides hands-free untethered use. It can track a moving speaker and adapt to changing environments. The array is an ideal sound capture device for circumstances where it is costly or inconvenient to pro- vide close talking microphones for all potential sound sources of interest. Possible applications for microphone arrays include conference centers, con- certs, sporting events, and cellular radio in automobiles.

Thesis advisor: J. L. Flanagan.

Three-dimensional microphone arrays for spatially selective sound capture [43.38.Kb]•Arun C. Surendran, CAIP Center, Rutgers University, Piscataway, NJ 08855-1390, April 1993 (M.S. Electrical & Computer Engineering). High-quality sound capture in reverberant enclo- sures is important in applications like teleconferencing. This, combined with the need to eliminate the use of hand-held or body-worn equipment, has led to the use of one- and two-dimensional autodirective microphone arrays. Recent advances in transducer technology, signal processing, and comput- ing, has made possible the use of a large number of sensors in three- dimensional arrays. By simple delay-and-sum, these arrays provide beams whose widths are independent of the steering direction in three dimensions. These arrays have also been shown to provide useful volume selectivity, i.e., capture of sound from specified spatial volumes. In our simulations, the reverberant environment has been characterized using an image model, i.e., assuming multiple sources formed due to imaging on the reflecting surfaces. By forming multiple beams pointed at the source and at these images and by combining the outputs of these beams, significant improvement in speech signal-to-noise ratio (SNR) has been achieved. The spatial selectivity and

2595 J. Acoust. Soc. Am. 96 (4), October 1994 0001-4966/94/96(4)/2595/2/$6.00 ¸ 1994 Acoustical Society of America 2595

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Thu, 18 Dec 2014 11:26:48

Page 2: Three-dimensional microphone arrays for spatially selective sound capture

dereverberation provided is also studied when the magnitudes of the indi- vidual beams are summed. The spatial selectivity provided by various con- figurations of the microphone arrays has been studied and the relative merits assessed. All these experiments are performed as computer simulations and the results tested on real speech signals.

Thesis advisor: James L. Flanagan.

Finite element analysis of the propagation of acoustic waves in periodic materials [43.20.Bi, 43.35.Cg]--Philippe Langlet, Laboratoire d'Acoustique, I.E.M.N. (U.M.R. C.N.R.S. 9929), Institut Sup•rieur d'Electronique du Nord, 41 Bd Vauban, 59046 Lille Cedex, France, December 1993 (Doctorate). The propagation of plane acoustic waves in a material with a periodic array of cavities or inclusions is likely to involve many applications, especially in underwater acoustics, signal processing, and medical acoustics. Such materials are used, for example, as anechoic coatings, delay lines, or acoustic filters. Composite materials are more often used in the new ultrasonic transducers design. This thesis reports the mod- eling of single, doubly, or triply periodic, elastic, or piezoelectric materials, using the finite element method, with the help of the ATILA code. First, specific theoretical developments needed for the description of these mate- rials are presented. A first validation has been carried out on periodic mate- rials, for which analytical models exist. Then, the technique is applied to the study of the propagation of waves in porous or composite periodic materials, in plates or in waveguides. With the help of dispersion curves, finite element results are compared with semi-analytical previous or empirical models or to experimental results. The homogenized properties of porous materials are determined with the help of an anisotropic model, in the limit of large wavelengths. The homogenization process is validated with the help of pe- riodically perforated plates, the resonance frequencies of which have been measured. Finally, these results lead one to suggest that these methods can be extended to the coupled fluid-solid problem and to the study of evanes- cent waves in stopbands.

Thesis advisors: J.-N. Decarpigny, A.-C. Hladky-Hennion.

Copies of the thesis are available from Philippe Langlet, Laboratoire d'Acoustique, Institut Sup6rieur d'Electronique du Nord, 41 Bd Vauban, 59046 Lille Cedex, France.

Implementation of a low bit-rate CELP speech coder and its application to ATM packet networks [43.72.Gy]--Jayesh S. Patel, Department of Electrical Engineering, Rutgers--The State University of New Jersey, New Brunswick, NJ 08903, April 1994 (M.S.). Low-bit-rate speech codecs are used for making optimum use of available bandwidth. Such codecs are used in communication networks for cordless, mobile, and cellular radio. Speech codecs are also used in multimedia conferencing ap- plications over packet networks such as asynchronous transfer mode (ATM). Bandwidth economy is the main factor for using low-bit-rate codecs, whether for multiplexing several conversations over a single channel or for storing speech in applications such as a voice mailbox. High-quality speech is important in all of the mentioned applications. Hence it is necessary to chose a coder which can provide good quality of speech at the lowest bit rate. Code excited linear prediction (CELP) is a speech coding method that has potential for providing high-quality speech at transmission rates of 8 kbps and lower. In this thesis, various factors related to implementation of a CELP coder are explored. A 7.2-kbps CELP is developed and its quality is compared with other low-bit-rate codecs. The sensitivity of CELP param- eters to the type of transmission errors that can occur in an ATM packet network is studied and quantified. The CELP coder is demonstrated in a multipoint video/audio conferencing network.

Thesis advisor: J. L. Flanagan.

A modeling and measurement study of acoustic horns [43.38.Ja, 43.20.Mv, 43.20.Rz, 43.58.Bh]--John T. Post, Department of Electrical and Computer Engineerin& University of Texas, Austin, TX 78712, May

1994 (Ph.D.). Although acoustic horns have been in use for thousands of years, formal horn design only began approximately 80 years ago with the pioneering effort of A. G. Webster. In this dissertation, the improvements to Webster's original horn model are reviewed and the lack of analytical progress since Webster is noted. In an attempt to augment the traditional methods of analysis, a semi-analytical technique presented by Rayleigh is extended. Although Rayleigh's method is not based on one-dimensional wave propagation, it is found not to offer significant improvement over Webster's model. In order to be free of the limitations associated with ana-

lytical techniques, a numerical method based on boundary elements has been developed. It is suitable for solving radiation problems that can be modeled as a source in an infinite baffle. The exterior boundary element formulation is exchanged for an interior formulation by placing a hemi- sphere over the baffled source and using an analytical expansion of the field in the exterior half-space. The boundary element method is demonstrated by solving the baffled piston problem, and is then used to obtain the acoustic throat impedance and far-field directivity of axisymmetric horns having ex- ponential and tractrix contours. Experiments are performed to measure the throat impedance and the far-field directivity of two axisymmetric horns mounted in a rigid baffle. An exponential horn and a tractrix horn with equal throat radius (2.54 cm), length (55.9 cm), and mouth radius (27.1 cm) are critically examined. A modern implementation of the "reaction on the source" method is compared with a new implementation of the two- microphone method for measuring acoustic impedance. The modified two- microphone method is found to be extremely simple and accurate, but the "reaction on the source" method has the advantage of in situ measurements. The far-field directivity is measured by a new technique that allows the far-field pressure to be calculated from the measured near-field pressure. Experimental results compare very well with the numerical predictions ob- tained by the boundary element method. The annotated bibliography is 34 pages in length and features approximately 200 references that are useful in the general study of acoustic horns.

Thesis advisor: Elmer L. Hixson.

Coherent digital communications for rapidly fading channels with applications to underwater acoustics [43.60.Dh, 43.60.Lq]-- Milica Stojanovic, Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, August 1993 (Ph.D.). High- speed digital communications are addressed for rapidly fading multipath channels such as mobile radio, indoor wireless, or underwater acoustic (UWA) channels. Phase-coherent UWA communications are made difficult by the combined effect of extended, time-varying multipath propagation, and phase instabilities. A receiver is developed that jointly performs phase synchronization and adaptive decision-feedback equalization. Its perfor- mance is demonstrated on the long-range deep and shallow, and the medium-range shallow water channels. Despite the fundamental differences in the mechanism of sound propagation in these channels, excellent results are obtained in all the cases, showing flexibility of proposed algorithm, and demonstrating feasibility of high-rate bandwidth-efficient UWA communi- cations. Receiver algorithm is extended to the multichannel, spatial diversity case, and analyzed in the light of optimal spatial and temporal processing of multiple signal arrivals. Experimental results show considerable improve- ments offered by the spatial variability of ocean multipath. Besides fast algorithms suitable for real-time implementation, receiver structures that reduce complexity but preserve performance of the optimal combiner are presented. Finally, the impact of imperfect channel tracking on the average bit error probability is theoretically analyzed for Rayleigh fading channels. The expressions obtained show the channel mismatch penalty, as well as the fading induced irreducible error rates.

Thesis advisor: John G. Proakis.

2596 J. Acoust. Soc. Am., Vol. 96, No. 4, October 1994 Technical Notes and Research Briefs 2596

Redistribution subject to ASA license or copyright; see http://acousticalsociety.org/content/terms. Download to IP: 131.230.73.202 On: Thu, 18 Dec 2014 11:26:48