artificial intelligence: automatic speech recognition › portals › 1 › documents ›...

Artificial Intelligence: Automatic Speech Recognition

NALIT

Boise, Idaho2019

Presented by:

Nic Côté, Sliq Media Technologies

Agenda:

• Evolution of automatic speech recognition

• Current state of ASR

• Prediction: What to Expect

• Real world results

Automatic Speech Recognition

How does it work and what’s new?

How Speech Recognition Works

The Basics - Spectral analysis

The characteristic formant frequencies (F1 and F2) for the English vowels a, e, i, o and u are: 850Hz and 1610Hz (a); 390Hz and 2300Hz (e); 240Hz and 2400Hz (i); 360Hz and 640Hz (o) and 250Hz and 595Hz (u).Image: Charles McLellan

Traditional Technology Hidden Markov Model – Probability Based

Evolution of Speech Technology

New Technology Artificial Intelligence Models

Google Speech Recognition Error Rate - 2017

Why Artificial Intelligence?Contextual accuracy

Commercial Applications / Virtual Assistants

• Nuance – Dragon NaturallySpeaking

• Automotive Virtual Assistants

• Cortana

• Siri

• Google Now/Home

• Amazon Alexa

Legislative Applications

• Closed Captions/Subtitles

• Transcripts

Benefits

• Cost

• Improved accessibility

• Processing and production time

• Mitigate stenographer and transcriber availability

Challenges

• Accuracy/Acceptable error rate

• Consistency

• Recording Quality

• Processing delays for live captions

• Multilingual capabilities

• Dialects / Accents / Slang / Context

Prediction: What to Expect

• Massive improvements year over year

• Short term (<5years) progress will makes accurate ASR very accessible

• Consumer demand drives development, ASR will be ubiquitous

• Great progress expected regarding noisy environments and imprecise grammar

• Decreased processing time to optimize real time captions

• Affordability of outsourced captions/transcripts (currently $1.50 -$4/minute)

Human Brain

• Your brain contains 100 billion neurons and 10,000 times as many connections

• There are more than 125 trillion synapses just in the cerebral cortex alone

● 86 Million Equivalent AI Nuerons

150 billion Equivalent AI Synapses

Artificial Intelligence

Model Training

Better Training for Better Intelligence

Facial RecognitionExample

QueryImage

Vehicle RecognitionExample

Real World Results

Competing Automatic Speech Recognition Engines

ASR Battle

Can multiple Automatic Speech Recognition engines do a better job than just one?

YES!(mostly)

Legislative Recordings – Word Error Rates

ASR1 ASR2 Combined

Total words 2210 2210 2210

Total errors 273 178 148

WER 12.33% 8.05% 6.70%

Sample Legislative ASR – Failure

Sample Legislative ASR – Missed Word

Sample Legislative ASR – Removed Duplicate

Sample Legislativie ASR – Almost Success

Sample Legislative ASR - Success

Legislative RecordingsMulti-ASR Error Rate, Different SourcesHouse of Commons Budget Speech 4.0%

House of Commons QP1 6.1%

House of Commons QP2 13.2%

House of Commons Statements by Members 18.9%

Arkansas House of Representatives 8.9%

Oklahoma House of Representatives 11.4%

City of Fredericton English 9.3%

City of Fredericton French 6.4%

Improving Odds – Statistical Probability Model“Bad Poetry Filter”Accurate

"member for Saanich Gulf islands" - 3,260 results

"and I thank my friend from" - 8 results

"Okanagan Similkameen Nicola" - 29,700 results

"particularly for his advocacy" - 18,000 results

"those what really agricultural" - 0 results

Inaccurate"member for Senate sköll islands" - 0 results

"night I thank my friend for" - 0 results

"Okanagan smell me Nicola" - 0 results

"particular phrase advocacy" - 0 results

"those really agricultural" - 1 result

Live Examples

Arkansas House of Representatives -ASR Meeting

British Columbia – Closed Caption Search

Conclusion

• We are very close to 90% accuracy with many Legislative recordings

• Expect to be >95% within a few years

Problems:

• Speaker coherence

• Multilingual Languages

artificial intelligence: automatic speech recognition › portals › 1 › documents ›...

Documents