artificial intelligence: automatic speech recognition › portals › 1 › documents ›...

36
Artificial Intelligence: Automatic Speech Recognition NALIT Boise, Idaho 2019 Presented by: Nic Côté, Sliq Media Technologies

Upload: others

Post on 25-Jun-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Artificial Intelligence: Automatic Speech Recognition

NALIT

Boise, Idaho2019

Presented by:

Nic Côté, Sliq Media Technologies

Page 2: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Agenda:

• Evolution of automatic speech recognition

• Current state of ASR

• Prediction: What to Expect

• Real world results

Page 3: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Automatic Speech Recognition

How does it work and what’s new?

Page 4: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:
Page 5: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

How Speech Recognition Works

Page 6: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

The Basics - Spectral analysis

The characteristic formant frequencies (F1 and F2) for the English vowels a, e, i, o and u are: 850Hz and 1610Hz (a); 390Hz and 2300Hz (e); 240Hz and 2400Hz (i); 360Hz and 640Hz (o) and 250Hz and 595Hz (u).Image: Charles McLellan

Page 7: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Traditional Technology Hidden Markov Model – Probability Based

Page 8: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Evolution of Speech Technology

Page 9: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

New Technology Artificial Intelligence Models

Page 10: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Google Speech Recognition Error Rate - 2017

Page 11: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Why Artificial Intelligence?Contextual accuracy

Page 12: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Commercial Applications / Virtual Assistants

• Nuance – Dragon NaturallySpeaking

• Automotive Virtual Assistants

• Cortana

• Siri

• Google Now/Home

• Amazon Alexa

Page 13: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Legislative Applications

• Closed Captions/Subtitles

• Transcripts

Page 14: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Benefits

• Cost

• Improved accessibility

• Processing and production time

• Mitigate stenographer and transcriber availability

Page 15: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Challenges

• Accuracy/Acceptable error rate

• Consistency

• Recording Quality

• Processing delays for live captions

• Multilingual capabilities

• Dialects / Accents / Slang / Context

Page 16: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Prediction: What to Expect

• Massive improvements year over year

• Short term (<5years) progress will makes accurate ASR very accessible

• Consumer demand drives development, ASR will be ubiquitous

• Great progress expected regarding noisy environments and imprecise grammar

• Decreased processing time to optimize real time captions

• Affordability of outsourced captions/transcripts (currently $1.50 -$4/minute)

Page 17: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:
Page 18: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Human Brain

• Your brain contains 100 billion neurons and 10,000 times as many connections

• There are more than 125 trillion synapses just in the cerebral cortex alone

Page 19: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

● 86 Million Equivalent AI Nuerons

150 billion Equivalent AI Synapses

Artificial Intelligence

Page 20: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Model Training

Page 21: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Better Training for Better Intelligence

Page 22: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Facial RecognitionExample

QueryImage

Page 23: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Vehicle RecognitionExample

Page 24: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Real World Results

Competing Automatic Speech Recognition Engines

Page 25: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

ASR Battle

Can multiple Automatic Speech Recognition engines do a better job than just one?

Page 26: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

YES!(mostly)

Page 27: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Legislative Recordings – Word Error Rates

ASR1 ASR2 Combined

Total words 2210 2210 2210

Total errors 273 178 148

WER 12.33% 8.05% 6.70%

Page 28: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Sample Legislative ASR – Failure

Page 29: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Sample Legislative ASR – Missed Word

Page 30: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Sample Legislative ASR – Removed Duplicate

Page 31: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Sample Legislativie ASR – Almost Success

Page 32: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Sample Legislative ASR - Success

Page 33: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Legislative RecordingsMulti-ASR Error Rate, Different SourcesHouse of Commons Budget Speech 4.0%

House of Commons QP1 6.1%

House of Commons QP2 13.2%

House of Commons Statements by Members 18.9%

Arkansas House of Representatives 8.9%

Oklahoma House of Representatives 11.4%

City of Fredericton English 9.3%

City of Fredericton French 6.4%

Page 34: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Improving Odds – Statistical Probability Model“Bad Poetry Filter”Accurate

"member for Saanich Gulf islands" - 3,260 results

"and I thank my friend from" - 8 results

"Okanagan Similkameen Nicola" - 29,700 results

"particularly for his advocacy" - 18,000 results

"those what really agricultural" - 0 results

Inaccurate"member for Senate sköll islands" - 0 results

"night I thank my friend for" - 0 results

"Okanagan smell me Nicola" - 0 results

"particular phrase advocacy" - 0 results

"those really agricultural" - 1 result

Page 35: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Live Examples

Arkansas House of Representatives -ASR Meeting

British Columbia – Closed Caption Search

Page 36: Artificial Intelligence: Automatic Speech Recognition › Portals › 1 › Documents › nalit... · •Evolution of automatic speech recognition •Current state of ASR •Prediction:

Conclusion

• We are very close to 90% accuracy with many Legislative recordings

• Expect to be >95% within a few years

Problems:

• Speaker coherence

• Multilingual Languages