deep learning for speech recognition

DEEP LEARNING FOR SPEECH RECOGNITION Anantharaman Palacode Narayana Iyer JNResearch [email protected] 15 April 2016

Upload: ananth

Post on 14-Jan-2017

920 views

Category:

Technology

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

DEEP LEARNING FOR SPEECH RECOGNITION

Anantharaman Palacode Narayana Iyer

JNResearch

[email protected]

15 April 2016

mailto:[email protected]

Page 2: Deep Learning For Speech Recognition

REFERENCES

Page 3: Deep Learning For Speech Recognition

AGENDA

Types of Speech Recognition and applications

Traditional implementation pipeline

Deep Learning for Speech Recognition

Future directions

SPEECH APPLICATIONS

Speech recognition:

Hands-free in a car

Commands for Personal assistants – e.g Siri

Gaming

Conversational agents E.g. agent for flight schedule enquiry, bookings etc

Speaker identification E.g Forensics

Extracting emotions and social meanings

Text to speech

Page 5: Deep Learning For Speech Recognition

TYPES OF RECOGNITION TASKS

Isolated word recognition

Connected words recognition

Continuous speech recognition (LVCSR)

The above can be realized as:

Speaker independent implementation

Speaker dependent implementation

Page 6: Deep Learning For Speech Recognition

SPEECH RECOGNITION IS PROBABILISTIC

Steps:

Train the system

Cross validate, finetune

Test

Deploy

Speech Recognizer (ASR)

Speech Signal

Probabilistic match between input and a set of words

Page 7: Deep Learning For Speech Recognition

ISOLATED WORD RECOGNITION

From the audio signal generate features. MFCC or Filter banks are quite common

Perform any additional pre-processing

Using a code book of a given size, convert these features in to discrete symbols. This is the vector quantization procedure that can be implemented with k-means clustering

Train HMM’s using Baum Welch algorithm For each word in the vocabulary, instantiate a HMM

Intuitively choose the number of states

The set of symbols are all valid values of the code book

Use the HMM to predict unseen input

HMM 1

HMM 2

HMM n

Argmax λP(O|λ)

Observations Predicted Word

Page 8: Deep Learning For Speech Recognition

CONTINUOUS SPEECH RECOGNITION

• ASR for continuous speech is traditionally built using Gaussian Mixture Models (GMM)

• The emission probability table that we used for discrete symbols is now replaced by GMM

• The parameters of this model are learnt as a part of the training using Baum Welch procedure

Page 9: Deep Learning For Speech Recognition

KNOWLEDGE INTEGRATION FOR SPEECH RECOGNITION

Feature Analysis

Unit Matching System

Lexical Hypothesis

Syntactic Hypothesis

Semantic Hypothesis

UtterenceVerifier

Speech

Recognized utterance

Inventory of speech

recognition units

Word Dictition

ary

Grammar

Task Model

Page 10: Deep Learning For Speech Recognition

SOME CHALLENGES

We don’t know the number of words

We don’t know the boundaries

They are fuzzy and non unique

For V word reference patterns and L positions there are exponential combinatorial possibilities

Page 11: Deep Learning For Speech Recognition

USING DEEP NETWORKS FOR ASR

Replace the GMM with a Deep Neural Networks that directly provides the likelihood estimates

Interface the DNN with a HMM decoder

Issues: We still need the HMM with

its underlying assumptions for tractable computation

Page 12: Deep Learning For Speech Recognition

EMERGING TRENDS

HMM-free ASRs Avoids phoneme prediction and hence the need to have a

phoneme database

Active area of research

Current state of the art adopted by the industry uses DNN-HMM

Future ASRs are likely to be fully neural networks based

1 Deep Neural Networks for Acoustic Modeling in Speech Recognition · PDF file · 2012-04-272012-04-27 · Deep Neural Networks for Acoustic Modeling in Speech Recognition ... who

Introduction to Human Language Technology · Speech recognition: HMM (Khudanpur) Deep learning (Watanabe) End-to-end neural speech recognition (Watanabe) Speaker identiﬁcation,

Speech Emotion Recognition Using Deep Neural Network ......Speech Emotion Recognition Using Deep Neural Network Considering Verbal and Nonverbal Speech Sounds Kun-Yi Huang, Chung-Hsien

Automatic speech recognition system using deep learning

Information for Speech Recognition Joint Processing of ... Speech Recognition ... speech onset cues with audio-based speech energy Audio-Visual Speech synthesis ... speech recognition

End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learning for Speech and Language UPC 2017)

Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition

Deep Speech 2: End-to-End Speech Recognition in English ...web.eng.tau.ac.il/deep_learn/wp-content/uploads/2018/01/Speech-Recognition.pdfSpeech Recognition 19/12/2017 Deep Speech 1

TOWARDS DEEP LEARNING ON SPEECH RECOGNITION FOR A …

New Paradigm in Speech Recognition: Deep Neural Networks · · 2018-04-28Dominique Fohr, Odile Mella, Irina Illina. New Paradigm in Speech Recognition: Deep Neural Net- ... speech

Audio-visual speech recognition using deep learning speech recognition using deep learning 723 noise. The fundamental idea of AVSR is to use visual infor-mation derived from a speaker’s

Speech Recognition Deep Learning and Neural Nets Spring 2015

DEEP NEURAL NETWORKS FOR SPEECH SEPARATION WITH APPLICATION TO ROBUST SPEECH RECOGNITION · 2018-08-22 · automatic speech recognition (ASR). Speech separation has been recently

Speech Emotion Recognition Using Deep Convolutional ...Index Terms—Convolutional Neural Network, Speech Emotion Recognition, Simple Recurrent Unit I. INTRODUCTION anguage, as one

Deep Learning based Speech Emotion Recognition System

Deep Speech: Scaling up end-to-end speech recognition · Deep Speech: Scaling up end-to-end speech recognition Awni Hannun⇤, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos,

Combining Residual Networks with LSTMs for LipreadingIndex Terms: visual speech recognition, lipreading, deep learning 1. Introduction Visual speech recognition (also known as lipreading)

INTEL-0002 IntelXeonScalable DeepLearningBoost Battlecard v2 · Keywords: Inference, deep learning, image recognition, object detection, recommendation systems, speech recognition,

Deep learning for automatic speech recognition...languages for speech recognition. Language Resources and Evaluation, pp.1—27, 2016. (2) P.Smit, J.Leinonen, K.Jokinen, M.Kurimo

Deep Clustering: Discriminative embeddings for ... · – Use separated components: enhancement, remix, karaoke, etc. – Recognition & detection: speech recognition, surveillance,

New Paradigm in Speech Recognition: Deep Neural Networks

Deep Learning for Speech Recognition - Vikrant Singh Tomar

Speech Recognition Acoustic Modeling in Deep Neural

Deep Learning for Speech and Language Yoshua Bengio, U. Montreal NIPS’2009 Workshop on Deep Learning for Speech Recognition and Related Applications December

Audio-visual speech recognition using deep bottleneck ... · Audio-visual speech recognition using deep bottleneck features and high-performance lipreading Satoshi TAMURA , Hiroshi

Deep Learning for Speech Recognition in Cortana at AI NEXT Conference

Semi-supervised Ladder Networks for Speech Emotion Recognition … · 2020-01-31 · effective speech emotion recognition model with deep neural networks (DNN), leading to impressive

Deep Speech: Scaling up end-to-end speech recognition · 2014-12-23 · Deep Speech: Scaling up end-to-end speech recognition Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro,

Deep learning for automatic speech recognition - … · Deep learning for automatic speech recognition ... speech synthesis, translation, information retrieval from audio and video

Deep Speech Recognition - · PDF fileDeep Speech Recognition New-Generation Models & Methodology for Advancing Speech Technology and Information Processing Li Deng Microsoft Research,

Deep learning for end-to-end speech recognitionllu/pdf/liang_ttic_slides.pdf · Deep learning for end-to-end speech recognition Liang Lu Centre for Speech Technology Research The

Active Learning for Speech Emotion Recognition using Deep

Chapter 8 Robust Features in Deep Learning-Based …robust/Papers/MitraEtAl_100116.pdf8 Robust Features in Deep Learning-Based Speech Recognition 185 for speech recognition, a single

Deep Speech: Scaling up end-to-end speech recognition · PDF fileDeep Speech: Scaling up end-to-end speech recognition Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos,

Deep Learning for Audiovisual Speech Recognition …mwiering/Thesis_Marc_Groefsema.pdfUniversity of Groningen, the Netherlands Master’s Thesis Artificial Intelligence Deep Learning