adam coates at ai frontiers: ai for 100 million people with deep learning

26
Adam Coates AI for 100 million people with deep learning Adam Coates @adampaulcoates Silicon Valley AI Lab

Upload: ai-frontiers

Post on 24-Jan-2017

725 views

Category:

Technology


0 download

TRANSCRIPT

AdamCoates

AIfor100millionpeoplewithdeeplearning

AdamCoates@adampaulcoates

SiliconValleyAILab

AdamCoates

SiliconValleyAILab

• Mission: DevelophardAItechnologiesthatletushavesignificantimpactonhundredsofmillionsofusers.

Ø Chooseproblemsthatsignificantlyaffectlargenumbersofpeople.

AdamCoates

AIfor100millionpeople

• Firstgoal:speechrecognitioneverywhere.

Ifyou’reconnectingtointernetforfirsttimein2017,you’relikelyusingamobiledevice.

Speechwilltransformmobiledeviceinterfaces.

AdamCoates

AIfor100millionpeople

• Firstgoal:speechrecognitioneverywhere.

Mobiledevices

Captioning

Homedevices

Cars/Hands-freeinterfaces

AdamCoates

AIfor100millionpeople

• Firstgoal:speechrecognitioneverywhere.

Diversity

Accuracy

Specializedsolutions(e.g.,IVR)

TraditionalLVCSR

Human-levelspeechrecognition

AdamCoates

Speechrecognition:TraditionalASR• TraditionalspeechsystemsbuiltonstandardML+engineering

practices.Features

Computefeatures

Acou

sticM

odel

Predictphonemes

LanguageM

odel

Transcrip

tion

“WhattimeisitinBeijing?”

Mergewithpronunciationandlanguagedata.

Sequ

encem

odel

Combineintostatesequence

Someapplicationscanbesolvedthisway.Butit’shardtoscaleourowncleverness.

AdamCoates

Deeplearning

Data&Computation

Performance DeepLearningalgorithms

Manypreviousmethods

Majoradvantageofdeeplearning:scalability.

Effort

AdamCoates

Speechrecognitionwithdeeplearning

• Replacemostofspeechsystemwithlargeneuralnetwork.

Spectrograms

Languagem

odel

Transcrip

tion

“WhattimeisitinBeijing?”

SimpleLM(nolinguisticinfo)

Deep

Learning

AdamCoates

“DeepSpeech”

• Poureffortintodata+computation.– Trytocatchuptohumanaccuracybyscaling.

Data&Computation

Performance DeepSpeech

Manypreviousmethods

AdamCoates

DeepSpeech

Spectrogram

• Traingiantneuralnetworkstopredictcharactersfromaudio.– Train“endtoend”

[e.g.,Gravesetal.,2006]

Ø Hardpartistrainingatscale andsearchingforbestmodel.

Ø Needdata+computingpower.

AdamCoates

RawTrainingData

0

2000

4000

6000

8000

10000

12000

14000

WSJ Switchboard Fisher DeepSpeech

80 300

2000

11940

Hoursofrawspeechdata

WeneedalotofdataforendtoendDLsystems:usereadspeech.

AdamCoates

Dataaugmentation

• Manyformsofdistortionthatmodelshouldberobustto:– Reverb,noise,farfieldeffects,echo,

compressionartifacts,changesintempo.

• Learntoberobustbytrainingfromdatawithdistortions!– Easiertoengineerdatapipelinethantoengineerrecognitionpipeline.

Rawaudio($$$$) Novelaudio

Augmentation

AdamCoates

Example:farfield

Speech

ImpulseResponse

=*AugmentedFar-fieldspeech

Compare:RealFar-fieldspeech

[BillyJun,Rewon Child,SanjeevSatheesh]

Reduceserrorsonfar-fielddataby15%-20%.Reliesonmodelsearch+large-scaletraining.

AdamCoates

Augmenteddataset

0100002000030000400005000060000700008000090000100000

WSJ Switchboard Fisher DeepSpeech

80 300 2000

~96,000

Synthesizeddata

• Augmentationgreatlyexpandsavailabledata.– Trainedmodelshavehearddecades ofuniqueaudio.

AdamCoates

Compute

TitanXGPU~6TeraFLOPs

“Speedoflight”=3-6weekson1GPU

1experimentrequires>10,000,000,000,000,000,000FLOPs(10sofExaFLOPs)

AdamCoates

Compute

Infiniband network

ScaleouttolargenumbersofGPUs(e.g.,8– 64)

• Cutexperimenttimesto~3-5days.– Achieve~50%ofpeakFLOPson8+GPUs.– Comparabletosupercomputingworkloads.

AdamCoates

DeepSpeechforMandarin

• DeepSpeechisdrivenbydata.–MandarinisverydifferentfromEnglish.• “Tonal”,thousandsofcharacters

DeepSpeechTraining

“Myfavoritesingeris…”

Englishtrainingdata

AdamCoates

DeepSpeechforMandarin

• DeepSpeechisdrivenbydata.–MandarinisverydifferentfromEnglish.• “Tonal”,thousandsofcharacters

DeepSpeechTraining

“我最喜欢的歌手是…”

Mandarintrainingdata

AdamCoates

DeepSpeechforMandarin

• Withafewchanges,asinglealgorithmcanlearnanentirelynewlanguage.– Competitivewithcommitteeofnativespeakersforshortaudioclips.

Ø Learnshybridspeech(e.g.,famouspeople,iphones):

我最喜欢的歌手是Angelababy

AdamCoates

Makingdeviceseasierandmoreefficient

• Doesspeechmakeadifference?YES!

AdamCoates

Comparingspeechwithkeyboardinput

• Compareuserperformance/experienceforspeechvs.traditionalkeyboard.[Ruan etal.,arxiv.org/abs/1608.07323]

AdamCoates

Speechis3xfasterthantyping

Ruan etal.,arxiv.org/abs/1608.07323

AdamCoates

…andmoreaccurate.

Ruanetal.,arxiv.org/abs/1608.07323

AdamCoates

TalkType – voice-centrickeyboardforAndroid

• OpportunitytorethinkproductexperiencesaroundspeechandAI.

[NickyChan,Bijit Halder,KennyLiou,Thuan Nguyen,NinaWei]talktype.baidu.com

AdamCoates

AIfor100millionpeople

• DeepLearningisclosinggapwithhumansonspeech,throughscalability.– Stillmoretodo;butitkeepsgettingbetter.

• SpeechalreadyenablingproliferationofnewAIproducts.– Let’smakethemworkforeveryone.

AdamCoates

Thankyou!

AdamCoates

[email protected]

@adampaulcoates

IfyouwanttohelpbringAIto100sofmillionsofpeople,cometalktous!

http://research.baidu.com