speech recognition acoustic modeling in deep neural networks formozer/teaching/syllabi/deep... ·...

15

Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton et al, Oct 2012

Upload: others

Post on 20-Jun-2020

5 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Deep Neural Networks for Acoustic Modeling in Speech Recognition

Hinton et al, Oct 2012

Page 2: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Outline

1. The Speech Problem2. GMMs ( - 2011)3. Neural Nets (2011 - Today)4. Tweaks5. Why are DNNs better?6. 2012-2015 Developments

Page 3: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Speech Recognition Problem

Viterbi Search

{{Language Model Acoustic Model

{

Page 4: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Speech Recognition Problem - HMM

d ʌ z n ɑ t“does not”

Page 5: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Acoustic Model - 1980s to 2011 - GMMs

● Easy to fit by plugging in to baum-welch (EM)

● Fast to train and decode

● Can fine-tune using discriminative post-training (MMI)

Page 6: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Acoustic Model - 2011-Today - Neural Nets

Page 7: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Neural Net - Pretraining V-H Units

Gaussian-Bernoulli RBM:

Page 8: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Neural Net - Pretraining H-H Units

RBM:

Page 9: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Neural Net - Fine Tuning (Backprop)

Page 10: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Tweaks - Maximum Mutual Information Training

more closely related to objective (sequence labeling)~5% relative gain in accuracy

transition probabilities (HMM) agreement between activations + hidden units

Page 11: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Tweaks - Convolutional Nets

Source: “Convolutional Neural Networks for Speech Recognition” O. Abdel-Hamid et al, IEEE Transactions on Audio, Speech, and Language Processing, Oct 2014

~5% relative gain in accuracy

Page 12: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Why are DNNs > GMMs?

Hinton’s story:● RBM is a “Product of experts” model,

whereas GMM is a “Mixture of experts” model○ “Each param of a product model is constrained by a

large fraction of the data”● DNNs can model simultaneous events,

GMMs assume 1 mixture component generates observation

● DNNs benefit more from context frames

Page 13: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Experiments

Page 14: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Developments Since 2012

Better hardware means bigger training sets○ 2011 - several hundred hours of training data○ 2015 - 100,000 hours of training data (Baidu

DeepSpeech, 10k + 90k synthesized)

“Currently, the biggest disadvantage of DNNs compared with GMMs is that it is much harder to make good use of large cluster machines to train them on massive data sets.”> No longer true! (GTC 2015)

Page 15: Speech Recognition Acoustic Modeling in Deep Neural Networks formozer/Teaching/syllabi/Deep... · 2015-04-01 · Deep Neural Networks for Acoustic Modeling in Speech Recognition Hinton

Developments Since 2012

Learn the features - raw filterbank/FFT energies outperform hand-engineered MFCCs/PLPs (esp. in noisy environments*)

Lots of hacks and tweaks:○ Dropout/ReLU/etc - no need for pretraining○ Recurrent nets○ Data augmentation - primarily the addition of noise

* “Deep Speech: Scaling up end-to-endspeech recognition” - A. Hannum et al, 2015

Deep Neural Decision Forests

Neural Network Acoustic Models 1: Introduction

Neural Networks - cs.cmu.edubhiksha/courses/deep... · about neural networks (a.k.a deep learning) This guy learned about neural networks (a.k.a deep learning) Objectives of this

Neural Network Acoustic Models - Carnegie Mellon …ymiao/pub/IS_slides_sat_dnn_V2.pdfMotivation Deep neural networks become the state of the art for acoustic modeling For GMM models,

Deep Neural Networks - uchile.cl

Acoustic Signal Processing Based on Deep Neural Networkshome.ustc.edu.cn/~xiaosong/ppt/ASP-DNN.pdf · Acoustic Signal Processing Based on Deep Neural Networks Chin-Hui Lee School

Deep Convolutional Neural Networks on Multichannel Time ... · Deep Convolutional Neural Networks On Multichannel Time Series ... as a common input for the neural network ... Deep

Deep Convolutional Neural Network for Image …papers.nips.cc/paper/5485-deep-convolutional-neural...Deep Convolutional Neural Network for Image Deconvolution Li Xu ∗ LenovoResearch

Speech Recognition Acoustic Modeling in Deep Neural

Deep Neural Network acoustic models for ASR Deep Neural Network acoustic models for ASR Abdel-rahman Mohamed Doctor of Philosophy Graduate Department of Computer Science University

Deep convolutional neural networks - Computer Science- …web.cs.ucdavis.edu/.../lee_lecture19_deeplearning_notes.pdf · Deep convolutional neural networks ... “Deep” architecture

Do Deep Neural Networks Suffer from Crowding?papers.nips.cc/paper/...deep-neural-networks-suffer-from-crowding.pdf · Do Deep Neural Networks Suffer from Crowding? Anna Volokitiny\

Rolling in the Deep Acoustic

ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH ... · tiﬁcial neural network-based acoustic models, such as deep neural networks, mixture density networks, and long short-term

Spatial audio feature discovery with convolutional neural ... · Index Terms— Spatial sound, virtual reality, HRTF personal-ization, Deep Taylor Decomposition, acoustic feature

Lecture 10: Neural Networks and Deep Learningsaravanan-thirumuruganathan.github.io/cse5334... · Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks

Complex-valued Linear Layers for Deep Neural Network-based ... · 4/8/2016 · Complex-valued Linear Layers for Deep Neural Network-based Acoustic Models for Speech Recognition Author:

Deep Neural Networks for Object Detectionpapers.nips.cc/.../5207-deep-neural-networks-for-object-detection.pdf · Deep Neural Networks for Object Detection ... recognition becomes

1 Deep Neural Networks for Acoustic Modeling in …research.google.com/pubs/archive/38131.pdf1 Deep Neural Networks for Acoustic Modeling in Speech Recognition Geoffrey Hinton, Li

Deep neural networks

1 Deep Neural Networks for Acoustic Modeling in … · 1 Deep Neural Networks for Acoustic Modeling in Speech Recognition Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel …

Acoustic to articulatory mapping with deep neural networkhccl/publications/pub/2015... · Acoustic to articulatory mapping with deep neural network Zhiyong Wu & Kai Zhao & Xixin Wu

Deep Learning in Neural Networks: An Overviejuergen/DeepLearning28May2014.pdf · Deep Learning in Neural Networks: An Overview ... 1 Introduction to Deep Learning ... Neural Fitted

Introduction to Deep Neural Networks - Deep Learning · Introduction to Deep Neural Networks 0. Logistics Spring 2020 1. Neural Networks are taking over! •Neural networks have become

Deep Compression: Compressing Deep Neural Networks with …forum.stanford.edu/events/posterslides/DeepCompression... · 2016-03-01 · Deep Compression: Compressing Deep Neural Networks

Deep Neural Networks in Machine Translation: An …perpustakaan.unitomo.ac.id/repository/Deep Neural...Deep Neural Networks in Machine Translation: An Overview Jiajun Zhang and Chengqing

1 Deep Neural Networks for Acoustic Modeling in Speech Recognition · PDF file · 2012-04-272012-04-27 · Deep Neural Networks for Acoustic Modeling in Speech Recognition ... who

Deep Neural Network Approach for the Dialog State Tracking ...people.cs.pitt.edu/~huynv/research/deep-nets/Deep Neural Network Approach for the...Deep Neural Network Approach for the

Neural Networks for Acoustic Modelling 4: LSTM …Neural Networks for Acoustic Modelling 4: LSTM acoustic models; Sequence discriminative training Steve Renals Automatic Speech Recognition

Deep Neural Network Based Acoustic-to-articulatory

Deep Neural Networks - dmi.unibas.ch

Deep Multi-State Dynamic Recurrent Neural Networks ...papers.nips.cc/paper/9594-deep-multi-state-dynamic-recurrent-neural... · Deep Multi-State Dynamic Recurrent Neural Networks

Debugging deep neural networks

Deep Learning Binary Neural Network on an FPGA · Deep Learning Binary Neural Network on an ... deep neural networks have attracted lots of attentions ... brain has a deep architecture

Optimizing Deep Neural Networks