gpu-accelerated hmm for speech recognition leiming yu, yash ukidave and david kaeli ece,...
TRANSCRIPT
GPU-ACCELERATED HMM FOR SPEECH RECOGNITION
Leiming Yu, Yash Ukidave and David Kaeli ECE, Northeastern University
HUCAA 2014
Outline
Background & Motivation
HMM
GPGPU
Results
Future Work
Background
• Translate Speech to Text
• Speaker DependentSpeaker Independent
• Applications* Natural Language Processing* Home Automation* In-car Voice Control* Speaker Verifications* Automated Banking* Personal Intelligent Assistants
Apple SiriSamsung S Voice
* etc.
[http://www.kecl.ntt.co.jp]
DTWDynamic Time Warping
A template-based approach to measure similarity between two temporal sequences which may vary in time or speed.
[opticalengineering.spiedigitallibrary.org]
DTWDynamic Time Warping
DTW Pros:1) Handle timing variation2) Recognize Speech at reasonable cost
DTW Cons:1) Template Choosing2) Ending point detection (VAD, acoustic noise) 3) Words with weak fricatives, close to acoustic background
For i := 1 to n For j := 1 to m cost:= D(s[i], t[j]) DTW[i, j] := cost + minimum(DTW[i-1, j ], DTW[i , j-1], DTW[i-1, j-1])
Neural NetworksAlgorithms mimics the brain.
Simplified Interpretation:* takes a set of input features* goes through a set of hidden layers* produces the posterior probabilities as the output
Neural Networks
“activation” of unit in layer
matrix of weights controlling function mapping from layer to layer
Bike Pedestrian Car Parking Meter
If Pedestrian
[Machine Learning, Coursera]
Neural Networks
Equation Example
Neural Networks Example
Hint: * effective in recognizing individual phones isolated words as short-time units
* not ideal for continuous recognition tasks largely due to the poor ability to model temporal dependencies.
Hidden Markov ModelIn a Hidden Markov Model,
* the states are hidden* output that depend on the states are visible
x — statesy — possible observationsa — state transition probabilitiesb — output probabilities
[wikipedia]
Hidden Markov ModelThe temporal transition of the hidden states fits well with the nature of phoneme transition.
Hint: * Handle temporal variability of speech well * Gaussian mixture models(GMMs), controlled by the hidden variables determine how well a HMM can represent the acoustic input. * Hybrid with NN to leverage each modeling technique
Motivation• Parallel Architecture
multi-core CPU to many-core GPU ( graphics + general purpose)
• Massive Parallelism in Speech Recognition SystemNeural Networks, HMMs, etc. , are both Computation and Memory Intensive
• GPGPU Evolvement* Dynamic Parallelism
* Concurrent Kernel Execution* Hyper-Q* Device Partitioning* Virtual Memory Addressing* GPU-GPU Data Transfer, etc.
• Previous works
• Our goal is to use new modern GPU features to accelerate Speech Recognition
Outline
Background & Motivation
HMM
GPGPU
Results
Future Work
Hidden Markov ModelMarkov chains and processes are named after Andrey Andreyevich Markov(1856-1922), a Russian mathematician, whose Doctoral Advisor is Pafnuty Chebyshev.
1966, Leonard Baum described the underlying mathematical theory.
1989, Lawrence Rabiner wrote a paper with the most comprehensive description on it.
Hidden Markov ModelHMM Stages
* causal transitional probabilities between states
* observation depends on current state, not predecessor
Hidden Markov Model
Forward
Backward
Expectation-Maximization
HMM-Forward
Hidden Markov Model
Forward
Backward
Expectation-Maximization
HMM Backward
I J
t - 1 t t + 1 t + 2
𝛼 𝑖(𝑡) 𝛽 𝑗 (𝑡+1)
𝛼 𝑖𝑗
𝛽 𝑗 (𝑥𝑡+1)
HMM-EM
Variable Definitions:* Initial Probability
* Transition Prob. Observation Prob.
* Forward Variable Backward Variable
Other Variables During Estimation:* the estimated state transition probability matrix, epsilon
* the estimated probability in a particular state at time t, gamma
* Multivariate Normal Probability Density FunctionUpdate Obs. Prob. From Gaussian Mixture Models
𝜀
𝛾
HMM-EM
Outline
Background & Motivation
HMM
GPGPU
Results
Future Work
GPGPU
Programming Model
GPGPUGPU Hierarchical Memory System
[http://www.biomedcentral.com]
• Visibility
• Performance Penalty
GPGPU
[www.math-cs.gordon.edu]
• Visibility
• Performance Penalty
GPGPUGPU-powered Eco System
1) Programming Model* CUDA* OpenCL* OpenACC, etc.
2) High Performance Libraries* cuBLAS* Thrust* MAGMA (CUDA/OpenCL/Intel Xeon Phi)* Armadilo (C++ Linear Algebra Library), drop-in libraries etc.
3) Tuning/Profiling Tools* Nvidia: nvprof / nvvp* AMD: CodeXL
4) Consortium StandardsHeterogeneous System Architecture (HSA) Foundation
Outline
Background & Motivation
HMM
GPGPU
Results
Future Work
ResultsPlatform Specs
ResultsMitigate Data Transfer Latency
Pinned Memory Sizecurrent process limit: ulimit -l ( in KB )hardware limit: ulimit –H –lincrease the limit: ulimit –S –l 16384
Results
ResultsA Practice to Efficiently Utilize Memory System
Results
Results
Hyper-Q Feature
Results
Running Multiple Word Recognition Tasks
Results
Outline
Background & Motivation
HMM
GPGPU
Results
Future Work
Future Work
• Integrate with Parallel Feature Extraction
• Power Efficiency Implementation and Analysis
• Embedded System Development, Jetson TK1 etc.
• Improve generosity, LMs
• Improve robustness, Front-end noise cancelation
• Go with the trend!
QUESTIONS ?