machine learning applied to speech recognition

18
Machine learning applied to speech recognition 1 Alice Coucke Head of Machine Learning Research ENS Corporate meeting — 11/19/2020

Upload: others

Post on 09-Jan-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine learning applied to speech recognition

Machine learning applied to speech recognition

1

Alice CouckeHead of Machine Learning Research

ENS Corporate meeting — 11/19/2020

Page 2: Machine learning applied to speech recognition

1. Recent advances in applied ML

3. Working at Sonos

Outline:

2. From physics to machine learning

Page 3: Machine learning applied to speech recognition

Recent advances in applied Machine Learning

3

Page 4: Machine learning applied to speech recognition

Reinforcement learning Learning goal-oriented behavior within simulated environments

Dota 2 OpenAI Five (OpenAI)

Starcraft II AlphaStar (Deepmind)

Play-driven learning for robots (Google Brain)

Sim-to-real dexterity learning Project BLUE (UC Berkeley)

Go AlphaGo (Deepmind, 2016)

Page 5: Machine learning applied to speech recognition

Machine Learning for Life Sciences Deep learning applied to biology and medicine

Protein folding & structure prediction

AlphaFold (Deepmind)

Drug discovery (Recursion, MIT, Harvard)

Reconstruct speech from neural activity (UCSF)

Limb control restoration (Batelle, Ohio State Univ)

Eye disease diagnosis (NHS, UCL, Deepmind)

Page 6: Machine learning applied to speech recognition

Computer vision High-level understanding of digital images or videos

GANs for artificial video dubbing (Synthesia)

Visual Question Answering on everyday images (Facebook Research, Georgia Tech)

Future state prediction in dynamic scenes (Wayve, Univ of Cambridge)

Page 7: Machine learning applied to speech recognition

From physics to machine learning and back A surge of interest from the physics community

NeurIPS: workshop on « machine learning and the physical sciences »

Page 8: Machine learning applied to speech recognition

Speech and language Understand and analyze human speech

Speech transcription Human Parity (Microsoft)

Sentiment analysis Detect emotions in text and

speech

Spoken language understanding

(Super)GLUE benchmarks (Google, Facebook, IBM, Stanford …)

Text & speech generationGPT-2 (Open AI)

Bert (Google)XLNet (CMU) …

Voice activity detection Detect speech from audio

Speaker identification Recognize unique speakers

Neural machine translation Unsupervised MT (Facebook)

{ intent: FindWeather, entities: { datetime: 11/19/2020, location: Paris } }

🎶🎶

Page 9: Machine learning applied to speech recognition

Fairness, ethics, and explainability We, as scientists, have a say in the future of AI

Page 10: Machine learning applied to speech recognition

From physics to machine learning

10

Page 11: Machine learning applied to speech recognition

My background From physics to machine learning

2012: M2 ICFP Theoretical physics

2013-2016: PhD in statistical physics @LPTENS

Feb 2017: senior ML scientist @ Snips

2019: head of ML research @ Snips

Today: head of ML research @ Sonos

Page 12: Machine learning applied to speech recognition

Me :)Raffaele Tavarone

ML DirectorAcoustic modeling

Francesco CaltagironeML Director

Spoken Language Understanding

Stéphane d’AscoliML research intern

Now: PhD ENS & FAIR

Alaa SaadeSr ML ScientistNow: DeepMind

A few takeaways (please go ask other people too)

• PhD? Postdoc?

• Working at a startup company

• Physicists and machine learning

Physicists @ Snips/Sonos

Page 13: Machine learning applied to speech recognition

Sonos Voice Experience

13

Page 14: Machine learning applied to speech recognition

Sonos✦ Smart speakers in multi room✦ Hardware & software (~400 people)✦ 1.5k employees in 6 countries (US, France,

Netherlands, UK, Australia, China)

Sonos Home sound system

Sonos Voice Experience✦ Ex Snips, joined Sonos 2019✦ 50 employees: ML, Linguistics, SW engineering✦ Audio signal proc, speech, NLP, embedded

systems

Brand new tech blog:

https://tech-blog.sonos.com/

Page 15: Machine learning applied to speech recognition

t ɜ r n ɑ n ð ə l a ɪ t s ɪ n ð ə ˈl ɪ v ɪ ŋ r u m

Automatic Speech Recognition Engine

Language model

Acoustic model

Natural Language

UnderstandingEngine

Turn on the lights in the living room

Intent: SwitchLightOnSlots: room: living room

Language modeling

Machine Learning at Sonos Voice Experience Spoken Language understanding for voice assistants

Audio Frontend Action Code

Custom Wake Word

Wake WordAutomatic Command

Recognition

ACRIntent Classification

and Filling

ICFMultiturn Dialogue

and Response

Dialogue

Page 16: Machine learning applied to speech recognition

Our approach Our own voice in a vast ecosystem

Resource constrained ML:small data & hardware

Privacy by design

✓ A new popular trend in the ML community (low resource ML, transfer learning, miniaturization, etc)

✓ Numerous conferences and workshops on the topic

✓ Towards a safer, greener and more private conversational AI

Page 17: Machine learning applied to speech recognition

Research activity Publishing research in industry

Snips voice platform: an embedded spoken language

understanding system for private-by-design voice interfaces

ICML 2018 Workshop PiMLAI

arXiv

ICASSP 2019 Main track

Efficient keyword spotting using dilated convolutions and gating

arXiv

Federated learning for keyword spotting

ICASSP 2019 Main track

arXiv

Spoken language understanding on the edge

NeurIPS 2019Workshop EMC2

arXiv

Small footprint Text-Independent Speaker Verification for Embedded

Systems

arXiv

ArXiv preprint 2020

A study on more realistic room simulation for far-field keyword

spotting

arXiv

APSIPA 2020 Main track

Predicting detection filters for small footprint open-vocabulary keyword

spotting

Interspeech 2020 Main track

arXiv

Conditionned text generation with transfer for closed-domain dialogue

systems

arXiv

SLSP 2020 Main track

Page 18: Machine learning applied to speech recognition

Thank you for your attention!

18

Questions?