beamformingnetworksusingspatialcovariancefeatures forfar … · 2020. 4. 28. ·...

8
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Beamforming Networks Using Spatial Covariance Features for Far-field Speech Recognition Xiao, Xiong; Watanabe, Shinji; Chng, Eng Siong; Li, Haizhou TR2016-162 December 13, 2016 Abstract Recently, a deep beamforming (BF) network was proposed to predict BF weights from phase- carrying features, such as generalized cross correlation (GCC). The BF network is trained jointly with the acoustic model to minimize automatic speech recognition (ASR) cost function. In this paper, we propose to replace GCC with features derived from input signals’ spatial covariance matrices (SCM), which contain the phase information of individual frequency bands. Experimental results on the AMI meeting transcription task shows that the BF network using SCM features significantly reduces the word error rate to 44.1% from 47.9% obtained with the conventional ASR pipeline using delay-and-sum BF. Also compared with GCC features, we have observed small but steady gain by 0.6% absolutely. The use of SCM features also facilitate the implementation of more advanced BF methods within a deep learning framework, such as minimum variance distortionless response BF that requires the speech and noise SCM. Asia-Pacific Signal and Information Processing Association Annual Summit and Con- ference (APSIPA ASC) This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Inc., 2016 201 Broadway, Cambridge, Massachusetts 02139

Upload: others

Post on 10-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BeamformingNetworksUsingSpatialCovarianceFeatures forFar … · 2020. 4. 28. · MITSUBISHIELECTRICRESEARCHLABORATORIES BeamformingNetworksUsingSpatialCovarianceFeatures forFar-fieldSpeechRecognition

MITSUBISHI ELECTRIC RESEARCH LABORATORIEShttp://www.merl.com

Beamforming Networks Using Spatial Covariance Featuresfor Far-field Speech Recognition

Xiao, Xiong; Watanabe, Shinji; Chng, Eng Siong; Li, Haizhou

TR2016-162 December 13, 2016

AbstractRecently, a deep beamforming (BF) network was proposed to predict BF weights from phase-carrying features, such as generalized cross correlation (GCC). The BF network is trainedjointly with the acoustic model to minimize automatic speech recognition (ASR) cost function.In this paper, we propose to replace GCC with features derived from input signals’ spatialcovariance matrices (SCM), which contain the phase information of individual frequencybands. Experimental results on the AMI meeting transcription task shows that the BFnetwork using SCM features significantly reduces the word error rate to 44.1% from 47.9%obtained with the conventional ASR pipeline using delay-and-sum BF. Also compared withGCC features, we have observed small but steady gain by 0.6% absolutely. The use of SCMfeatures also facilitate the implementation of more advanced BF methods within a deeplearning framework, such as minimum variance distortionless response BF that requires thespeech and noise SCM.

Asia-Pacific Signal and Information Processing Association Annual Summit and Con-ference (APSIPA ASC)

This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy inwhole or in part without payment of fee is granted for nonprofit educational and research purposes provided that allsuch whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi ElectricResearch Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and allapplicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall requirea license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved.

Copyright c© Mitsubishi Electric Research Laboratories, Inc., 2016201 Broadway, Cambridge, Massachusetts 02139

Page 2: BeamformingNetworksUsingSpatialCovarianceFeatures forFar … · 2020. 4. 28. · MITSUBISHIELECTRICRESEARCHLABORATORIES BeamformingNetworksUsingSpatialCovarianceFeatures forFar-fieldSpeechRecognition
Page 3: BeamformingNetworksUsingSpatialCovarianceFeatures forFar … · 2020. 4. 28. · MITSUBISHIELECTRICRESEARCHLABORATORIES BeamformingNetworksUsingSpatialCovarianceFeatures forFar-fieldSpeechRecognition
Page 4: BeamformingNetworksUsingSpatialCovarianceFeatures forFar … · 2020. 4. 28. · MITSUBISHIELECTRICRESEARCHLABORATORIES BeamformingNetworksUsingSpatialCovarianceFeatures forFar-fieldSpeechRecognition
Page 5: BeamformingNetworksUsingSpatialCovarianceFeatures forFar … · 2020. 4. 28. · MITSUBISHIELECTRICRESEARCHLABORATORIES BeamformingNetworksUsingSpatialCovarianceFeatures forFar-fieldSpeechRecognition
Page 6: BeamformingNetworksUsingSpatialCovarianceFeatures forFar … · 2020. 4. 28. · MITSUBISHIELECTRICRESEARCHLABORATORIES BeamformingNetworksUsingSpatialCovarianceFeatures forFar-fieldSpeechRecognition
Page 7: BeamformingNetworksUsingSpatialCovarianceFeatures forFar … · 2020. 4. 28. · MITSUBISHIELECTRICRESEARCHLABORATORIES BeamformingNetworksUsingSpatialCovarianceFeatures forFar-fieldSpeechRecognition
Page 8: BeamformingNetworksUsingSpatialCovarianceFeatures forFar … · 2020. 4. 28. · MITSUBISHIELECTRICRESEARCHLABORATORIES BeamformingNetworksUsingSpatialCovarianceFeatures forFar-fieldSpeechRecognition